深度学习每周学习总结P2（CIFAR10彩色图片分类）

🍨 本文为🔗365天深度学习训练营中的学习记录博客

🍖 原作者：K同学啊 | 接辅导、项目定制

- [0. 总结](#0. 总结)
- [1. 数据导入部分](#1. 数据导入部分)
- [2. 模型构建](#2. 模型构建)
- [3. 训练前的准备](#3. 训练前的准备)
- [4. 定义训练函数](#4. 定义训练函数)
- [5. 定义测试函数](#5. 定义测试函数)
- [6. 训练过程](#6. 训练过程)
- [7. 结果可视化](#7. 结果可视化)

0. 总结

数据导入部分：数据导入使用了torchvision自带的数据集，获取到数据后需要使用torch.utils.data中的DataLoader()加载数据

模型构建部分：有两个部分一个初始化部分（init()）列出了网络结构的所有层，比如卷积层池化层等。第二个部分是前向传播部分，定义了数据在各层的处理过程。

训练前的准备：在这之前需要定义损失函数，学习率，以及根据学习率定义优化器（例如SGD随机梯度下降），用来在训练中更新参数，最小化损失函数。

定义训练函数：函数的传入的参数有四个，分别是设置好的DataLoader(),定义好的模型，损失函数，优化器。函数内部初始化损失准确率为0，接着开始循环，使用DataLoader()获取一个批次的数据，对这个批次的数据带入模型得到预测值，然后使用损失函数计算得到损失值。接下来就是进行反向传播以及使用优化器优化参数，梯度清零放在反向传播之前或者是使用优化器优化之后都是可以的。将 optimizer.zero_grad() 放在了每个批次处理的开始，这是最标准和常见的做法。这样可以确保每次迭代处理一个新批次时，梯度是从零开始累加的。准确率是通过累计预测正确的数量得到的，处理每个批次的数据后都要不断累加正确的个数，最终的准确率是由预测正确的数量除以所有样本得数量得到的。损失值也是类似每次循环都累计损失值，最终的损失值是总的损失值除以训练批次得到的

定义测试函数：函数传入的参数相比训练函数少了优化器，只需传入设置好的DataLoader(),定义好的模型，损失函数。此外除了处理批次数据时无需再设置梯度清零、返向传播以及优化器优化参数，其余部分均和训练函数保持一致。

训练过程：定义训练次数，有几次就使用整个数据集进行几次训练，初始化四个空list分别存储每次训练及测试的准确率及损失。使用model.train()开启训练模式，调用训练函数得到准确率及损失。使用model.eval()将模型设置为评估模式，调用测试函数得到准确率及损失。接着就是将得到的训练及测试的准确率及损失存储到相应list中并合并打印出来，得到每一次整体训练后的准确率及损失。

模型的保存，调取及使用，暂时没有看到这部分，但是训练好的模型肯定是会用到这步的，需要自己添加进去。在PyTorch中，通常使用 torch.save(model.state_dict(), 'model.pth') 保存模型的参数，使用 model.load_state_dict(torch.load('model.pth')) 加载参数。

1. 数据导入部分

python 复制代码

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import torchvision

复制代码

C:\Users\chengyuanting\.conda\envs\pytorch_cpu\lib\site-packages\tqdm\auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

python 复制代码

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

复制代码

device(type='cpu')

python 复制代码

# 使用dataset下载CIFAR10数据集，并划分好训练集与测试集
train_ds = torchvision.datasets.CIFAR10(
    'data',
    train = True,
    transform = torchvision.transforms.ToTensor(),
    download = True
)
test_ds = torchvision.datasets.CIFAR10(
    'data',
    train = False,
    transform = torchvision.transforms.ToTensor(),
    download = True
)

复制代码

Files already downloaded and verified
Files already downloaded and verified

python 复制代码

# 使用dataloader加载数据，并设置好基本的batch_size
batch_size = 32
train_dl = torch.utils.data.DataLoader(
    train_ds,
    batch_size = batch_size,
    shuffle = True
)
test_dl = torch.utils.data.DataLoader(
    test_ds,
    batch_size = batch_size
)

python 复制代码

# 取一个批次查看数据格式
# 数据的shape为：[batch_size, channel, height, weight]
# 其中batch_size为自己设定，channel，height和weight分别是图片的通道数，高度和宽度。
imgs,labels = next(iter(train_dl))
imgs.shape

复制代码

torch.Size([32, 3, 32, 32])

python 复制代码

# 数据可视化
import numpy as np

plt.figure(figsize = (20,5))
for i,imgs in enumerate(imgs[:20]):
    # 维度缩减
    npimg = imgs.numpy().transpose((1,2,0))
    plt.subplot(2,10,i+1)
    plt.imshow(npimg,cmap=plt.cm.binary)
    plt.axis('off')
plt.show()

数据集补充介绍：

CIFAR10数据集共有60000个样本，每个样本都是一张32*32像素的RGB图像（彩色图像），每个RGB图像又必定分为3个通道（R通道、G通道、B通道）。这60000个样本被分成了50000个训练样本和10000个测试样本。

CIFAR10数据集是用来监督学习训练的，那么每个样本就一定都配备了一个标签值（用来区分这个样本是什么），不同类别的物体用不同的标签值，CIFAR10中有10类物体，标签值分别按照0~9来区分,他们分别是飞机（ airplane ）、汽车（ automobile ）、鸟（ bird ）、猫（ cat ）、鹿（ deer ）、狗（ dog ）、青蛙（ frog ）、马（ horse ）、船（ ship ）和卡车（ truck ）

2. 模型构建

下面的网络数据shape变化过程为：

3, 32, 32（输入数据）

-> 64, 30, 30（经过卷积层1）-> 64, 15, 15（经过池化层1）

-> 64, 13, 13（经过卷积层2）-> 64, 6, 6（经过池化层2）

-> 128, 4, 4（经过卷积层3） -> 128, 2, 2（经过池化层3）

-> 512 -> 256 -> num_classes(10)

python 复制代码

import torch.nn.functional as F

num_classes = 10

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        # 特征提取网络
        self.conv1 = nn.Conv2d(3,64,kernel_size = 3)   # 第一层卷积,卷积核大小为3*3
        self.pool1 = nn.MaxPool2d(kernel_size = 2)     # 设置池化层，池化核大小为2*2
        self.conv2 = nn.Conv2d(64,64,kernel_size = 3)  # 第二层卷积,卷积核大小为3*3
        self.pool2 = nn.MaxPool2d(kernel_size = 2)
        self.conv3 = nn.Conv2d(64,128,kernel_size = 3) # 第三层卷积,卷积核大小为3*3
        self.pool3 = nn.MaxPool2d(kernel_size = 2)
        
        # 分类网络
        self.fc1 = nn.Linear(512,256)
        self.fc2 = nn.Linear(256,num_classes)
    # 前向传播    
    def forward(self,x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))
        x = self.pool3(F.relu(self.conv3(x)))
        # Flatten层的主要作用是将多维数据（如卷积层的输出）转换为一维数据，以便可以将其输入到全连接层。
        x = torch.flatten(x,start_dim = 1)
        # 全连接层通常位于网络的末端，负责基于提取的特征进行最终的判断和预测。
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        
        return x

计算公式：

卷积维度计算公式：

高度方向： $H_{out} = \\frac{\\left(H_{in} - Kernel_size + 2\\times padding\\right)}{stride} + 1$
宽度方向： $W_{out} = \\frac{\\left(W_{in} - Kernel_size + 2\\times padding\\right)}{stride} + 1$
卷积层通道数变化：数据通道数为卷积层该卷积层定义的输出通道数，例如：self.conv1 = nn.Conv2d(3,64,kernel_size = 3)。在这个例子中，输出的通道数为64，这意味着卷积层使用了64个不同的卷积核，每个核都在输入数据上独立进行卷积运算，产生一个新的通道。需要注意，卷积操作不是在单独的通道上进行的，而是跨所有输入通道（本例中为3个通道）进行的，每个卷积核提供一个新的输出通道。

池化层计算公式：

高度方向： H o u t = ( H i n + 2 × p a d d i n g H − d i l a t i o n H × ( k e r n e l _ s i z e H − 1 ) − 1 s t r i d e H + 1 ) H_{out} = \left(\frac{H_{in} + 2 \times padding_H - dilation_H \times (kernel\_size_H - 1) - 1}{stride_H} + 1 \right) Hout=(strideHHin+2×paddingH−dilationH×(kernel_sizeH−1)−1+1)
宽度方向： W o u t = ( W i n + 2 × p a d d i n g W − d i l a t i o n W × ( k e r n e l _ s i z e W − 1 ) − 1 s t r i d e W + 1 ) W_{out} = \left( \frac{W_{in} + 2 \times padding_W - dilation_W \times (kernel\_size_W - 1) - 1}{stride_W} + 1 \right) Wout=(strideWWin+2×paddingW−dilationW×(kernel_sizeW−1)−1+1)

其中：

H i n H_{in} Hin 和 W i n W_{in} Win 是输入的高度和宽度。
p a d d i n g H padding_H paddingH 和 p a d d i n g W padding_W paddingW 是在高度和宽度方向上的填充量。
k e r n e l _ s i z e H kernel\_size_H kernel_sizeH 和 k e r n e l _ s i z e W kernel\_size_W kernel_sizeW 是卷积核或池化核在高度和宽度方向上的大小。
s t r i d e H stride_H strideH 和 s t r i d e W stride_W strideW 是在高度和宽度方向上的步长。
d i l a t i o n H dilation_H dilationH 和 d i l a t i o n W dilation_W dilationW 是在高度和宽度方向上的膨胀系数。

请注意，这里的膨胀系数 $dilation \\times (kernel_size - 1)$ 实际上表示核在膨胀后覆盖的区域大小。例如，一个 $3 \\times 3$ 的核，如果膨胀系数为2，则实际上它覆盖的区域大小为 $5 \\times 5$ （原始核大小加上膨胀引入的间隔）。

计算流程：

输入数据：( 3 ∗ 32 ∗ 32 3*32*32 3∗32∗32)

conv1计算：卷积核数64，输出的通道也为64。-> ( 64 ∗ 30 ∗ 30 ) (64*30*30) (64∗30∗30)
输出维度 = ( 32 − 3 + 2 × 0 ) 1 + 1 = 30 \text{输出维度} = \frac{\left(32 - 3 + 2 \times 0\right)}{1} + 1 = 30 输出维度=1(32−3+2×0)+1=30

pool1计算：通道数不变，步长没有申明默认和kernel_size的值一致（此处为2）-> ( 64 ∗ 15 ∗ 15 ) (64*15*15) (64∗15∗15)
输出维度 = ( 30 + 2 × 0 − 1 × ( 2 − 1 ) − 1 2 + 1 ) = 14 + 1 = 15 \text{输出维度} = \left(\frac{30 + 2 \times 0 - 1 \times \left(2 - 1\right) - 1}{2} + 1 \right) = 14 +1 = 15 输出维度=(230+2×0−1×(2−1)−1+1)=14+1=15

conv2计算：-> ( 64 ∗ 13 ∗ 13 ) (64*13*13) (64∗13∗13)
输出维度 = ( 15 − 3 + 2 × 0 ) 1 + 1 = 13 \text{输出维度} = \frac{\left(15 - 3 + 2 \times 0\right)}{1} + 1 = 13 输出维度=1(15−3+2×0)+1=13

pool2计算：-> ( 64 ∗ 6 ∗ 6 ) (64*6*6) (64∗6∗6)
输出维度 = ( 13 + 2 × 0 − 1 × ( 2 − 1 ) − 1 2 + 1 ) = 5.5 + 1 = 6.5 \text{输出维度} = \left(\frac{13 + 2 \times 0 - 1 \times \left(2 - 1\right) - 1}{2} + 1 \right) = 5.5 +1 = 6.5 输出维度=(213+2×0−1×(2−1)−1+1)=5.5+1=6.5

conv3计算：-> ( 128 ∗ 4 ∗ 4 ) (128*4*4) (128∗4∗4)
输出维度 = ( 6 − 3 + 2 × 0 ) 1 + 1 = 4 \text{输出维度} = \frac{\left(6 - 3 + 2 \times 0\right)}{1} + 1 = 4 输出维度=1(6−3+2×0)+1=4

pool3计算：-> ( 128 ∗ 2 ∗ 2 ) (128*2*2) (128∗2∗2)
输出维度 = ( 4 + 2 × 0 − 1 × ( 2 − 1 ) − 1 2 + 1 ) = 2 \text{输出维度} = \left(\frac{4 + 2 \times 0 - 1 \times \left(2 - 1\right) - 1}{2} + 1 \right) = 2 输出维度=(24+2×0−1×(2−1)−1+1)=2

flatten层：-> 512 512 512

fc1层：-> 256 256 256

fc2层：-> n u m _ c l a s s e s ( 10 ) num\_classes(10) num_classes(10)

python 复制代码

# 加载并打印模型
from torchinfo import summary
# 将模型转移到GPU中
model = Model().to(device)

summary(model)

复制代码

=================================================================
Layer (type:depth-idx)                   Param #
=================================================================
Model                                    --
├─Conv2d: 1-1                            1,792
├─MaxPool2d: 1-2                         --
├─Conv2d: 1-3                            36,928
├─MaxPool2d: 1-4                         --
├─Conv2d: 1-5                            73,856
├─MaxPool2d: 1-6                         --
├─Linear: 1-7                            131,328
├─Linear: 1-8                            2,570
=================================================================
Total params: 246,474
Trainable params: 246,474
Non-trainable params: 0
=================================================================

3. 训练前的准备

python 复制代码

loss_fn = nn.CrossEntropyLoss() # 创建损失函数
learn_rate = 1e-2 # 学习率
opt = torch.optim.SGD(model.parameters(),lr = learn_rate)

4. 定义训练函数

python 复制代码

def train(dataloader,model,loss_fn,optimizer):
    size = len(dataloader.dataset) # 训练集的大小，一共60000张图片
    num_batches = len(dataloader)  # 批次数目，1875（60000/32）
    train_acc,train_loss = 0,0     # 初始化训练损失和正确率
    
    for X,y in dataloader:
        X,y = X.to(device),y.to(device)
        # 计算误差
        pred = model(X)
        loss = loss_fn(pred,y)
        # 反向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 记录acc与loss
        train_acc += (pred.argmax(1)==y).type(torch.float).sum().item()
        train_loss += loss.item()
    
    train_acc /= size
    train_loss /= num_batches
    
    return train_acc,train_loss

5. 定义测试函数

python 复制代码

def test(dataloader,model,loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_acc,test_loss = 0,0
    # 当不进行训练时，停止梯度更新，节省计算内存消耗
    with torch.no_grad():
        for X,y in dataloader:
            X,y = X.to(device),y.to(device)
            # 计算误差
            pred = model(X)
            loss = loss_fn(pred,y)
            # 记录acc与loss
            test_acc += (pred.argmax(1)==y).type(torch.float).sum().item()
            test_loss += loss.item()

    test_acc /= size
    test_loss /= num_batches
    
    return test_acc,test_loss

6. 训练过程

python 复制代码

epochs = 10
train_loss = []
train_acc = []
test_loss = []
test_acc = []

for epoch in range(epochs):
    model.train()
    epoch_train_acc,epoch_train_loss = train(train_dl,model,loss_fn,opt)
    model.eval()
    epoch_test_acc,epoch_test_loss = test(test_dl,model,loss_fn)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    template = ('Epoch:{:2d},Train_acc:{:.1f}%,Train_loss:{:.3f},Test_acc:{:.1f}%,Test_loss:{:.3f}')
    print(template.format(epoch+1,epoch_train_acc*100,epoch_train_loss,epoch_test_acc*100,epoch_test_loss))
    
print('Done')

复制代码

C:\Users\chengyuanting\.conda\envs\pytorch_cpu\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ..\c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


Epoch: 1,Train_acc:14.8%,Train_loss:2.261,Test_acc:21.8%,Test_loss:2.078
Epoch: 2,Train_acc:24.5%,Train_loss:2.020,Test_acc:28.7%,Test_loss:1.945
Epoch: 3,Train_acc:32.6%,Train_loss:1.835,Test_acc:38.6%,Test_loss:1.688
Epoch: 4,Train_acc:40.0%,Train_loss:1.643,Test_acc:41.8%,Test_loss:1.604
Epoch: 5,Train_acc:44.2%,Train_loss:1.533,Test_acc:46.8%,Test_loss:1.456
Epoch: 6,Train_acc:47.7%,Train_loss:1.439,Test_acc:49.3%,Test_loss:1.409
Epoch: 7,Train_acc:51.2%,Train_loss:1.361,Test_acc:52.8%,Test_loss:1.312
Epoch: 8,Train_acc:54.2%,Train_loss:1.286,Test_acc:53.1%,Test_loss:1.331
Epoch: 9,Train_acc:56.6%,Train_loss:1.224,Test_acc:56.7%,Test_loss:1.227
Epoch:10,Train_acc:58.8%,Train_loss:1.168,Test_acc:57.2%,Test_loss:1.196
Done

7. 结果可视化

python 复制代码

import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings("ignore")
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['figure.dpi'] = 100

epochs_range = range(epochs)

plt.figure(figsize=(12,3))
plt.subplot(1,2,1)

plt.plot(epochs_range,train_acc,label='Training Accuracy')
plt.plot(epochs_range,test_acc,label='Test Accuracy')
plt.legend(loc = 'lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1,2,2)

plt.plot(epochs_range,train_loss,label='Training Loss')
plt.plot(epochs_range,test_loss,label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

python 复制代码

深度学习每周学习总结P2（CIFAR10彩色图片分类）

目录

0. 总结

1. 数据导入部分

2. 模型构建

3. 训练前的准备

4. 定义训练函数

5. 定义测试函数

6. 训练过程

7. 结果可视化