目录
[LeNet-5 结构](#LeNet-5 结构)
LeNet-5
LeNet-5 是由 Yann LeCun 等人在 1998 年提出的一种经典卷积神经网络(CNN)模型,主要用于手写数字识别任务。它在 MNIST 数据集上表现出色,并且是深度学习历史上的一个重要里程碑。
LeNet-5 结构
LeNet-5 的结构包括以下几个层次:
- 输入层: 32x32 的灰度图像。
- 卷积层 C1: 包含 6 个 5x5 的滤波器,输出尺寸为 28x28x6。
- 池化层 S2: 平均池化层,输出尺寸为 14x14x6。
- 卷积层 C3: 包含 16 个 5x5 的滤波器,输出尺寸为 10x10x16。
- 池化层 S4: 平均池化层,输出尺寸为 5x5x16。
- 卷积层 C5: 包含 120 个 5x5 的滤波器,输出尺寸为 1x1x120。
- 全连接层 F6: 包含 84 个神经元。
- 输出层: 包含 10 个神经元,对应于 10 个类别。
mnist手写数字识别
Mnist数据集可以算是学习深度学习最常用到的了。
这个数据集包含70000张手写数字图片,分别是60000张训练图片和10000张测试图片,训练集 由来自250个不同人手写的数字构成,一般来自高中生,一半来自工作人员,测试集(test set)也是同样比例的手写数字数据,并且保证了测试集和训练集的作者不同。
每个图片都是28*28个像素点,数据集/会把一张图片的数据转成一个2828=784的一维向量存储起来。
里面的图片数据如下所示,每张图是0-9的手写数字黑底白字的图片,存储时,黑色用0表示,白色用0-1的浮点数表示。
pytorch实现
lenet模型
python
import torch.nn as nn
import torch.nn.functional as func
class LeNet5(nn.Module):
def __init__(self):
super(LeNet5, self).__init__()
self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
self.fc1 = nn.Linear(16*4*4, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = func.relu(self.conv1(x))
x = func.max_pool2d(x, 2)
x = func.relu(self.conv2(x))
x = func.max_pool2d(x, 2)
x = x.view(x.size(0), -1)
x = func.relu(self.fc1(x))
x = func.relu(self.fc2(x))
x = self.fc3(x)
return x
训练模型
导入数据,并训练模型
python
import torch
from torch import nn
from torch import optim
from models import *
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
if __name__ == '__main__':
# Define the image transformations: convert to grayscale and then to tensor
transform = transforms.Compose([
transforms.Grayscale(num_output_channels=1),
transforms.ToTensor()
])
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Load the training dataset from the specified directory and apply transformations
train_dataset = datasets.ImageFolder(root='./mnist_train', transform=transform)
# Load the test dataset from the specified directory and apply transformations
test_dataset = datasets.ImageFolder(root='./mnist_test', transform=transform)
# Print the length of the training dataset
print("train_dataset length: ", len(train_dataset))
# Print the length of the test dataset
print("test_dataset length: ", len(test_dataset))
# Create a DataLoader for the training dataset with batch size of 64 and shuffling enabled
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
# Print the number of batches in the training DataLoader
print("train_loader length: ", len(train_loader))
# Iterate over the first few batches of the training DataLoader
# for batch_idx, (data, label) in enumerate(train_loader):
# # Uncomment the following lines to break after 3 batches
# # if batch_idx == 3:
# # break
# # Print the batch index
# print("batch_idx: ", batch_idx)
# # Print the shape of the data tensor
# print("data.shape: ", data.shape)
# # Print the shape of the label tensor
# print("label.shape: ", label.shape)
# # Print the labels
# print(label)
# Initialize the neural network model
#model = PreNetwork().to(device)
model = LeNet5().to(device)
#model = AlexNet().to(device)
# self.model = VGG11().to(self.device)
# self.model = VGG13().to(self.device)
# self.model = VGG16().to(self.device)
# self.model = VGG19().to(self.device)
# self.model = GoogLeNet().to(self.device)
# self.model = resnet18().to(self.device)
# self.model = resnet34().to(self.device)
# self.model = resnet50().to(self.device)
# self.model = resnet101().to(self.device)
# self.model = resnet152().to(self.device)
# self.model = DenseNet121().to(self.device)
# self.model = DenseNet161().to(self.device)
# self.model = DenseNet169().to(self.device)
# self.model = DenseNet201().to(self.device)
# self.model = WideResNet(depth=28, num_classes=10).to(self.device)
# Initialize the Adam optimizer with the model's parameters
optimizer = optim.Adam(model.parameters())
# Define the loss function as cross-entropy loss
criterion = nn.CrossEntropyLoss().to(device)
# Train the model for 10 epochs
for epoch in range(10):
# Iterate over the batches in the training DataLoader
for batch_idx, (data, label) in enumerate(train_loader):
data, label = data.to(device), label.to(device)
print(data.shape)
print(label.shape)
# Forward pass: compute the model output
output = model(data)
# Compute the loss
loss = criterion(output, label)
# Backward pass: compute the gradients
loss.backward()
# Update the model parameters
optimizer.step()
# Zero the gradients for the next iteration
optimizer.zero_grad()
# Print the loss every 100 batches
if batch_idx % 100 == 0:
print(f"Epoch {epoch + 1}/10 "
f"| Batch {batch_idx}/{len(train_loader)} "
f"| Loss: {loss.item():.4f}")
torch.save(model, 'mnist.pth')
单张图片测试
导入单张图片
导入模型,并测试
python
import torch
import cv2
import torch.nn.functional as F
#from model import Net ##重要,虽然显示灰色(即在次代码中没用到),但若没有引入这个模型代码,加载模型时会找不到模型
from torch.autograd import Variable
from torchvision import datasets, transforms
import numpy as np
if __name__ == '__main__':
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = torch.load('./mnist.pth') # 加载模型
model = model.to(device)
model.eval() # 把模型转为test模式
img = cv2.imread("1.png",0) # 读取要预测的图片
trans = transforms.Compose(
[
transforms.ToTensor()
])
img = trans(img)
img = img.to(device)
img = img.unsqueeze(0) # 图片扩展多一维,因为输入到保存的模型中是4维的[batch_size,通道,长,宽],而普通图片只有三维,[通道,长,宽]
# 扩展后,为[1,1,28,28]
output = model(img)
prob = F.softmax(output,dim=1) #prob是10个分类的概率
print(prob)
value, predicted = torch.max(output.data, 1)
print(predicted.item())
测试结果
可以到测试结果正确。
python
tensor([[1.0468e-24, 2.3783e-21, 1.0000e+00, 2.5184e-12, 1.1491e-20, 1.0807e-24,
6.6296e-21, 7.4025e-21, 3.6534e-18, 2.3143e-35]], device='cuda:0',
grad_fn=<SoftmaxBackward0>)
2
结论:
从前面的前馈神经网络实现mnist,到现在用lenet实现mnist,可以看到应该所有的分类性算法都可以用来实现mnist。