- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊
前言
-
环境版本
python 3.9.23
pytorch 2.5.1
pytorch-cuda 11.8
pytorch-mutex 1.0
torchaudio 2.5.1
torchinfo 1.8.0
torchvision 0.20.1
Visual Studio Code 1.104.2 (user setup) -
什么是ResNet-50?
- 以前大家觉得神经网络层数越多,学得越好。但后来发现,当网络太深时,训练反而变得更难了------准确率不升反降。而ResNet 的出现就解决了这个问题。它通过一种叫"跳连接"的设计,让信息可以直接从前面跳到后面,避免在层层传递中"丢失"或者"混乱"。简略点说就是可以把它想象成:本来你要爬一座很高的楼梯,现在 ResNet 给你加了一部电梯(跳连接),你可以直接从第1层到第3层,不用每一步都走,这样又快又不容易累。
- ResNet-50 的"50"指的是网络中包含约50个可学习层(包括卷积层和全连接层)。它的结构主要由以下部分组成:
- 初始卷积层:一个7×7的卷积层,后接批归一化(BatchNorm)和ReLU激活,再通过一个3×3最大池化层进行下采样。
- 四个阶段 每个阶段由多个Bottleneck残差块堆叠而成。Bottleneck结构使用1×1卷积先降维、再3×3卷积提取特征、最后1×1卷积升维,这种设计在保证表达能力的同时大幅减少了计算量。
- 全局平均池化层
- 全连接分类层:输出最终的类别预测。
-
什么是残差?残差的本质是什么?
- 假设你输入一张图片,网络要输出一个结果。传统做法是让网络从头到尾一步步算出最终结果。但 ResNet 换了个思路:我不直接算最终结果,我只算"还需要调整多少"。比如,理想输出是 10,当前输入已经是 8 了,那网络就只需要学"+2"这个差值就行。这个"+2"就是残差。所以,残差的本质就是:让网络学"差多少",而不是"从零开始学全部"。这样任务变简单了,尤其在很深的网络里,更容易训练,也不容易"学歪"。
代码实现
设置gpu
python
import torch
import torch.nn as nn
from torchvision import transforms, datasets
from PIL import Image
import matplotlib.pyplot as plt
import os,PIL,pathlib,warnings
warnings.filterwarnings("ignore") #忽略警告信息
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

数据导入
python
# 数据导入
data_dir = "../../datasets/bones/"
train_transforms = transforms.Compose([
transforms.Resize([224, 224]), # 将输入图片resize成统一尺寸
transforms.ToTensor(), # 将图像转换为tensor,并归一化到[0,1]之间
transforms.Normalize( # 转换为标准正太分布(高斯分布)
mean=[0.485, 0.456, 0.406],
std =[0.229, 0.224, 0.225])
])
total_data = datasets.ImageFolder(data_dir, transform = train_transforms)
total_data

标签打印
python
# 数据集中每个类别名称对应的数字标签(class_to_idx),用于模型训练时的标签编码
total_data.class_to_idx

数据集划分和创建数据加载器
python
# 划分训练集和测试集(8:2)
total_size = len(total_data)
train_size = int(0.8 * total_size)
test_size = total_size - train_size
train_dataset, test_dataset = torch.utils.data.random_split(total_data, [train_size, test_size])
train_dataset, test_dataset

python
# 创建训练数据加载器:每次从训练集中加载 batch_size=4 个样本,并打乱顺序(shuffle=True)
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=4,
shuffle=True)
# 创建测试数据加载器:每次从测试集中加载 batch_size=4 个样本,不打乱顺序(默认 shuffle=False)
test_loader = torch.utils.data.DataLoader(test_dataset,
batch_size=4)
验证数据形状
python
# 打印一个 batch 的数据形状以验证
for X, y in test_loader:
print(f"输入张量形状 [Batch, Channel, Height, Width]: {X.shape}")
print(f"标签形状: {y.shape}, 数据类型: {y.dtype}")
break

构建ResNet-50
python
# 自动计算 "same padding"
def autopad(k, p=None): # k: 卷积核大小(int 或 list),p: 用户指定的 padding(可选)
"""
自动计算卷积所需的 padding,使得输出特征图尺寸与输入一致(即 "same padding")。
- 如果 k 是整数(如 3),则 padding = k // 2(例如 3//2=1)
- 如果 k 是列表(如 [3, 5]),则对每个元素分别取一半
"""
if p is None: # 没有手动指定 padding
if isinstance(k, int): # 判断 k 是否为整数
p = k // 2 # 整数卷积核:padding = kernel_size // 2
else:
p = [x // 2 for x in k] # 列表卷积核:对每个维度单独计算 padding
return p
# 残差块类型1:Identity Block(输入输出维度相同)
class IdentityBlock(nn.Module):
"""
恒等残差块(Identity Block):
- 用于网络中不需要改变特征图尺寸和通道数的位置
- shortcut(捷径)直接使用原始输入 x,无需额外卷积
- 结构:1x1 conv → 3x3 conv → 1x1 conv + 残差连接
"""
def __init__(self, in_channel, kernel_size, filters):
super(IdentityBlock, self).__init__() # 调用父类 nn.Module 的初始化方法
# 解包 filters 列表
filters1, filters2, filters3 = filters
# 第一个卷积块:1x1 卷积,用于降维(减少计算量)
self.conv1 = nn.Sequential(
nn.Conv2d(in_channel, filters1, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(filters1), # 批归一化
nn.ReLU(inplace=True) # ReLU 激活函数,inplace=True 表示直接修改输入以节省内存
)
# 第二个卷积块:3x3 卷积(主卷积),使用 autopad 自动填充以保持尺寸不变
self.conv2 = nn.Sequential(
nn.Conv2d(filters1, filters2, kernel_size, stride=1, padding=autopad(kernel_size), bias=False),
nn.BatchNorm2d(filters2),
nn.ReLU(inplace=True)
)
# 第三个卷积块:1x1 卷积,用于升维(恢复到目标通道数,如 256)
self.conv3 = nn.Sequential(
nn.Conv2d(filters2, filters3, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(filters3)
# 注意:这里没有 ReLU!因为残差连接后统一加 ReLU
)
# 最终的 ReLU 激活(加在残差相加之后)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
"""
前向传播过程:
- 先通过三个卷积块得到变换后的特征 x1
- 将 x1 与原始输入 x 相加(残差连接)
- 最后经过 ReLU 激活
"""
x1 = self.conv1(x)
x1 = self.conv2(x1)
x1 = self.conv3(x1)
x = x1 + x # 残差连接:F(x) + x
x = self.relu(x) # 统一激活
return x
# 残差块类型2:Conv Block(输入输出维度不同)
class ConvBlock(nn.Module):
"""
卷积残差块(Conv Block):
- 用于需要下采样(stride=2)或改变通道数的位置
- shortcut 需要额外的 1x1 卷积来匹配输出维度
- 通常出现在每个 stage 的第一个残差块
"""
def __init__(self, in_channel, kernel_size, filters, stride=2):
super(ConvBlock, self).__init__()
filters1, filters2, filters3 = filters
# 主干路径(和 IdentityBlock 类似,但第一个卷积 stride 可能为 2)
self.conv1 = nn.Sequential(
nn.Conv2d(in_channel, filters1, kernel_size=1, stride=stride, padding=0, bias=False),
nn.BatchNorm2d(filters1),
nn.ReLU(inplace=True)
)
self.conv2 = nn.Sequential(
nn.Conv2d(filters1, filters2, kernel_size, stride=1, padding=autopad(kernel_size), bias=False),
nn.BatchNorm2d(filters2),
nn.ReLU(inplace=True)
)
self.conv3 = nn.Sequential(
nn.Conv2d(filters2, filters3, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(filters3)
)
# shortcut 路径:用 1x1 卷积调整原始输入 x 的通道数和尺寸,使其能与 x1 相加
self.conv4 = nn.Sequential(
nn.Conv2d(in_channel, filters3, kernel_size=1, stride=stride, padding=0, bias=False),
nn.BatchNorm2d(filters3)
)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
# 主干路径
x1 = self.conv1(x)
x1 = self.conv2(x1)
x1 = self.conv3(x1)
# shortcut 路径:调整原始输入维度
x2 = self.conv4(x)
# 残差连接
x = x1 + x2
x = self.relu(x)
return x
# 主网络:ResNet-50
class ResNet50(nn.Module):
"""
ResNet-50 网络结构:
- 总共 5 个主要阶段(conv1 到 conv5)
- 每个阶段包含若干残差块
- 最终通过全局平均池化 + 全连接层输出分类结果
"""
def __init__(self, classes=3):
super(ResNet50, self).__init__()
# 第一阶段:初始卷积 + 最大池化(大幅下采样)
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False), # 输入 3 通道(RGB)
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1) # 再次下采样
)
# 第二阶段(conv2_x)
self.conv2 = nn.Sequential(
ConvBlock(64, 3, [64, 64, 256], stride=1),
IdentityBlock(256, 3, [64, 64, 256]),
IdentityBlock(256, 3, [64, 64, 256])
)
# 第三阶段(conv3_x)
self.conv3 = nn.Sequential(
ConvBlock(256, 3, [128, 128, 512]), # stride=2(默认),下采样
IdentityBlock(512, 3, [128, 128, 512]),
IdentityBlock(512, 3, [128, 128, 512]),
IdentityBlock(512, 3, [128, 128, 512])
)
# 第四阶段(conv4_x)
self.conv4 = nn.Sequential(
ConvBlock(512, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024])
)
# 第五阶段(conv5_x)
self.conv5 = nn.Sequential(
ConvBlock(1024, 3, [512, 512, 2048]),
IdentityBlock(2048, 3, [512, 512, 2048]),
IdentityBlock(2048, 3, [512, 512, 2048])
)
# 全局平均池化
self.pool = nn.AvgPool2d(kernel_size=7, stride=7, padding=0)
# 全连接分类层
self.fc = nn.Linear(2048, classes)
def forward(self, x):
"""
前向传播流程:
输入 → conv1 → conv2 → conv3 → conv4 → conv5 → pool → flatten → fc → 输出
"""
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.conv5(x)
x = self.pool(x)
x = torch.flatten(x, start_dim=1)
x = self.fc(x)
return x
# 实例化模型并移动到设备(GPU/CPU)
# 自动选择设备:如果有 GPU 就用 CUDA,否则用 CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 创建 ResNet-50 模型,设置为 3 分类任务
model = ResNet50(classes=3).to(device) # .to(device) 将模型参数加载到指定设备
查看模型
python
# 统计模型参数数量以及其他指标
import torchsummary as summary
summary.summary(model, (3, 224, 224))


由于太长没有截完,完整模型结构如下:
python
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 112, 112] 9,408
BatchNorm2d-2 [-1, 64, 112, 112] 128
ReLU-3 [-1, 64, 112, 112] 0
MaxPool2d-4 [-1, 64, 56, 56] 0
Conv2d-5 [-1, 64, 56, 56] 4,096
BatchNorm2d-6 [-1, 64, 56, 56] 128
ReLU-7 [-1, 64, 56, 56] 0
Conv2d-8 [-1, 64, 56, 56] 36,864
BatchNorm2d-9 [-1, 64, 56, 56] 128
ReLU-10 [-1, 64, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 16,384
BatchNorm2d-12 [-1, 256, 56, 56] 512
Conv2d-13 [-1, 256, 56, 56] 16,384
BatchNorm2d-14 [-1, 256, 56, 56] 512
ReLU-15 [-1, 256, 56, 56] 0
ConvBlock-16 [-1, 256, 56, 56] 0
Conv2d-17 [-1, 64, 56, 56] 16,384
BatchNorm2d-18 [-1, 64, 56, 56] 128
ReLU-19 [-1, 64, 56, 56] 0
Conv2d-20 [-1, 64, 56, 56] 36,864
BatchNorm2d-21 [-1, 64, 56, 56] 128
ReLU-22 [-1, 64, 56, 56] 0
Conv2d-23 [-1, 256, 56, 56] 16,384
BatchNorm2d-24 [-1, 256, 56, 56] 512
ReLU-25 [-1, 256, 56, 56] 0
IdentityBlock-26 [-1, 256, 56, 56] 0
Conv2d-27 [-1, 64, 56, 56] 16,384
BatchNorm2d-28 [-1, 64, 56, 56] 128
ReLU-29 [-1, 64, 56, 56] 0
Conv2d-30 [-1, 64, 56, 56] 36,864
BatchNorm2d-31 [-1, 64, 56, 56] 128
ReLU-32 [-1, 64, 56, 56] 0
Conv2d-33 [-1, 256, 56, 56] 16,384
BatchNorm2d-34 [-1, 256, 56, 56] 512
ReLU-35 [-1, 256, 56, 56] 0
IdentityBlock-36 [-1, 256, 56, 56] 0
Conv2d-37 [-1, 128, 28, 28] 32,768
BatchNorm2d-38 [-1, 128, 28, 28] 256
ReLU-39 [-1, 128, 28, 28] 0
Conv2d-40 [-1, 128, 28, 28] 147,456
BatchNorm2d-41 [-1, 128, 28, 28] 256
ReLU-42 [-1, 128, 28, 28] 0
Conv2d-43 [-1, 512, 28, 28] 65,536
BatchNorm2d-44 [-1, 512, 28, 28] 1,024
Conv2d-45 [-1, 512, 28, 28] 131,072
BatchNorm2d-46 [-1, 512, 28, 28] 1,024
ReLU-47 [-1, 512, 28, 28] 0
ConvBlock-48 [-1, 512, 28, 28] 0
Conv2d-49 [-1, 128, 28, 28] 65,536
BatchNorm2d-50 [-1, 128, 28, 28] 256
ReLU-51 [-1, 128, 28, 28] 0
Conv2d-52 [-1, 128, 28, 28] 147,456
BatchNorm2d-53 [-1, 128, 28, 28] 256
ReLU-54 [-1, 128, 28, 28] 0
Conv2d-55 [-1, 512, 28, 28] 65,536
BatchNorm2d-56 [-1, 512, 28, 28] 1,024
ReLU-57 [-1, 512, 28, 28] 0
IdentityBlock-58 [-1, 512, 28, 28] 0
Conv2d-59 [-1, 128, 28, 28] 65,536
BatchNorm2d-60 [-1, 128, 28, 28] 256
ReLU-61 [-1, 128, 28, 28] 0
Conv2d-62 [-1, 128, 28, 28] 147,456
BatchNorm2d-63 [-1, 128, 28, 28] 256
ReLU-64 [-1, 128, 28, 28] 0
Conv2d-65 [-1, 512, 28, 28] 65,536
BatchNorm2d-66 [-1, 512, 28, 28] 1,024
ReLU-67 [-1, 512, 28, 28] 0
IdentityBlock-68 [-1, 512, 28, 28] 0
Conv2d-69 [-1, 128, 28, 28] 65,536
BatchNorm2d-70 [-1, 128, 28, 28] 256
ReLU-71 [-1, 128, 28, 28] 0
Conv2d-72 [-1, 128, 28, 28] 147,456
BatchNorm2d-73 [-1, 128, 28, 28] 256
ReLU-74 [-1, 128, 28, 28] 0
Conv2d-75 [-1, 512, 28, 28] 65,536
BatchNorm2d-76 [-1, 512, 28, 28] 1,024
ReLU-77 [-1, 512, 28, 28] 0
IdentityBlock-78 [-1, 512, 28, 28] 0
Conv2d-79 [-1, 256, 14, 14] 131,072
BatchNorm2d-80 [-1, 256, 14, 14] 512
ReLU-81 [-1, 256, 14, 14] 0
Conv2d-82 [-1, 256, 14, 14] 589,824
BatchNorm2d-83 [-1, 256, 14, 14] 512
ReLU-84 [-1, 256, 14, 14] 0
Conv2d-85 [-1, 1024, 14, 14] 262,144
BatchNorm2d-86 [-1, 1024, 14, 14] 2,048
Conv2d-87 [-1, 1024, 14, 14] 524,288
BatchNorm2d-88 [-1, 1024, 14, 14] 2,048
ReLU-89 [-1, 1024, 14, 14] 0
ConvBlock-90 [-1, 1024, 14, 14] 0
Conv2d-91 [-1, 256, 14, 14] 262,144
BatchNorm2d-92 [-1, 256, 14, 14] 512
ReLU-93 [-1, 256, 14, 14] 0
Conv2d-94 [-1, 256, 14, 14] 589,824
BatchNorm2d-95 [-1, 256, 14, 14] 512
ReLU-96 [-1, 256, 14, 14] 0
Conv2d-97 [-1, 1024, 14, 14] 262,144
BatchNorm2d-98 [-1, 1024, 14, 14] 2,048
ReLU-99 [-1, 1024, 14, 14] 0
IdentityBlock-100 [-1, 1024, 14, 14] 0
Conv2d-101 [-1, 256, 14, 14] 262,144
BatchNorm2d-102 [-1, 256, 14, 14] 512
ReLU-103 [-1, 256, 14, 14] 0
Conv2d-104 [-1, 256, 14, 14] 589,824
BatchNorm2d-105 [-1, 256, 14, 14] 512
ReLU-106 [-1, 256, 14, 14] 0
Conv2d-107 [-1, 1024, 14, 14] 262,144
BatchNorm2d-108 [-1, 1024, 14, 14] 2,048
ReLU-109 [-1, 1024, 14, 14] 0
IdentityBlock-110 [-1, 1024, 14, 14] 0
Conv2d-111 [-1, 256, 14, 14] 262,144
BatchNorm2d-112 [-1, 256, 14, 14] 512
ReLU-113 [-1, 256, 14, 14] 0
Conv2d-114 [-1, 256, 14, 14] 589,824
BatchNorm2d-115 [-1, 256, 14, 14] 512
ReLU-116 [-1, 256, 14, 14] 0
Conv2d-117 [-1, 1024, 14, 14] 262,144
BatchNorm2d-118 [-1, 1024, 14, 14] 2,048
ReLU-119 [-1, 1024, 14, 14] 0
IdentityBlock-120 [-1, 1024, 14, 14] 0
Conv2d-121 [-1, 256, 14, 14] 262,144
BatchNorm2d-122 [-1, 256, 14, 14] 512
ReLU-123 [-1, 256, 14, 14] 0
Conv2d-124 [-1, 256, 14, 14] 589,824
BatchNorm2d-125 [-1, 256, 14, 14] 512
ReLU-126 [-1, 256, 14, 14] 0
Conv2d-127 [-1, 1024, 14, 14] 262,144
BatchNorm2d-128 [-1, 1024, 14, 14] 2,048
ReLU-129 [-1, 1024, 14, 14] 0
IdentityBlock-130 [-1, 1024, 14, 14] 0
Conv2d-131 [-1, 256, 14, 14] 262,144
BatchNorm2d-132 [-1, 256, 14, 14] 512
ReLU-133 [-1, 256, 14, 14] 0
Conv2d-134 [-1, 256, 14, 14] 589,824
BatchNorm2d-135 [-1, 256, 14, 14] 512
ReLU-136 [-1, 256, 14, 14] 0
Conv2d-137 [-1, 1024, 14, 14] 262,144
BatchNorm2d-138 [-1, 1024, 14, 14] 2,048
ReLU-139 [-1, 1024, 14, 14] 0
IdentityBlock-140 [-1, 1024, 14, 14] 0
Conv2d-141 [-1, 512, 7, 7] 524,288
BatchNorm2d-142 [-1, 512, 7, 7] 1,024
ReLU-143 [-1, 512, 7, 7] 0
Conv2d-144 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-145 [-1, 512, 7, 7] 1,024
ReLU-146 [-1, 512, 7, 7] 0
Conv2d-147 [-1, 2048, 7, 7] 1,048,576
BatchNorm2d-148 [-1, 2048, 7, 7] 4,096
Conv2d-149 [-1, 2048, 7, 7] 2,097,152
BatchNorm2d-150 [-1, 2048, 7, 7] 4,096
ReLU-151 [-1, 2048, 7, 7] 0
ConvBlock-152 [-1, 2048, 7, 7] 0
Conv2d-153 [-1, 512, 7, 7] 1,048,576
BatchNorm2d-154 [-1, 512, 7, 7] 1,024
ReLU-155 [-1, 512, 7, 7] 0
Conv2d-156 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-157 [-1, 512, 7, 7] 1,024
ReLU-158 [-1, 512, 7, 7] 0
Conv2d-159 [-1, 2048, 7, 7] 1,048,576
BatchNorm2d-160 [-1, 2048, 7, 7] 4,096
ReLU-161 [-1, 2048, 7, 7] 0
IdentityBlock-162 [-1, 2048, 7, 7] 0
Conv2d-163 [-1, 512, 7, 7] 1,048,576
BatchNorm2d-164 [-1, 512, 7, 7] 1,024
ReLU-165 [-1, 512, 7, 7] 0
Conv2d-166 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-167 [-1, 512, 7, 7] 1,024
ReLU-168 [-1, 512, 7, 7] 0
Conv2d-169 [-1, 2048, 7, 7] 1,048,576
BatchNorm2d-170 [-1, 2048, 7, 7] 4,096
ReLU-171 [-1, 2048, 7, 7] 0
IdentityBlock-172 [-1, 2048, 7, 7] 0
AvgPool2d-173 [-1, 2048, 1, 1] 0
Linear-174 [-1, 3] 6,147
================================================================
Total params: 23,514,179
Trainable params: 23,514,179
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 274.49
Params size (MB): 89.70
Estimated Total Size (MB): 364.77
----------------------------------------------------------------
训练和测试函数
python
# 训练循环
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset) # 训练集的大小
num_batches = len(dataloader) # 批次数目(size/batch_size,向上取整)
train_loss, train_acc = 0, 0 # 初始化训练损失和正确率
for X, y in dataloader: # 获取图片及其标签
X, y = X.to(device), y.to(device)
# 计算预测误差
pred = model(X) # 网络输出
loss = loss_fn(pred, y) # 计算网络输出和真实值之间的差距
# 反向传播
optimizer.zero_grad() # grad属性归零
loss.backward() # 反向传播
optimizer.step() # 每一步自动更新
# 记录acc和loss
train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()
train_loss += loss.item()
train_acc /= size # 计算训练集整体正确率
train_loss /= num_batches # 计算训练集平均损失
return train_acc, train_loss
python
# 测试函数
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset) # 测试集的大小
num_batches = len(dataloader) # 批次数目(size/batch_size,向上取整)
test_loss, test_acc = 0, 0
# 当不进行训练时,停止梯度更新,节省计算内存消耗
with torch.no_grad():
for imgs, target in dataloader:
imgs, target = imgs.to(device), target.to(device)
# 计算loss
target_pred = model(imgs)
loss = loss_fn(target_pred, target)
test_loss += loss.item()
test_acc += (target_pred.argmax(1) == target).type(torch.float).sum().item()
test_acc /= size
test_loss /= num_batches
return test_acc, test_loss
正式训练
python
# 训练
import copy
import torch
import torch.nn as nn
import torch.optim as optim
# 初始化优化器与损失函数
optimizer = optim.AdamW(model.parameters(), lr=1e-4)
loss_fn = nn.CrossEntropyLoss() # 创建损失函数
epochs = 10 # 训练轮数
# 初始化指标记录列表
train_loss = []
train_acc = []
test_loss = []
test_acc = []
best_acc = 0 # 设置最佳准确率,作为保存最佳模型的指标
for epoch in range(epochs):
# 训练阶段
model.train() # 开启训练模式(启用 Dropout、BatchNorm 等层的训练行为)
epoch_train_acc, epoch_train_loss = train(train_loader, model, loss_fn, optimizer)
# 测试阶段
model.eval() # 开启评估模式(禁用 Dropout、固定 BatchNorm 等层的参数)
epoch_test_acc, epoch_test_loss = test(test_loader, model, loss_fn)
# 保存最佳模型
if epoch_test_acc > best_acc:
best_acc = epoch_test_acc
best_model = copy.deepcopy(model) # 深拷贝当前最佳模型
# 记录训练/测试指标
train_acc.append(epoch_train_acc)
train_loss.append(epoch_train_loss)
test_acc.append(epoch_test_acc)
test_loss.append(epoch_test_loss)
# 获取当前学习率
lr = optimizer.state_dict()['param_groups'][0]['lr']
# 打印当前轮次的指标
template = ('第 {:2d} 轮,训练准确率:{:.1f}%,训练损失:{:.3f},测试准确率:{:.1f}%,测试损失:{:.3f},学习率:{:.2E}')
print(template.format(epoch + 1,
epoch_train_acc * 100,
epoch_train_loss,
epoch_test_acc * 100,
epoch_test_loss,
lr))
# 保存最佳模型到文件
PATH = './best_model.pth' # 保存的参数文件名
torch.save(best_model.state_dict(), PATH) # 保存模型的参数状态字典
print('完成')

可视化训练结果
python
# 结果可视化
import matplotlib.pyplot as plt
# 隐藏警告
import warnings
warnings.filterwarnings("ignore") # 忽略警告信息
# 配置 Matplotlib 显示(解决中文/负号显示问题)
plt.rcParams['font.sans-serif'] = ['SimHei'] # 正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 正常显示负号
plt.rcParams['figure.dpi'] = 100 # 设置图像分辨率为 100
from datetime import datetime
current_time = datetime.now() # 获取当前时间
epochs_range = range(epochs)
# 创建画布并绘制子图
plt.figure(figsize=(12, 3))
# 子图 1:准确率曲线
plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_acc, label='训练准确率')
plt.plot(epochs_range, test_acc, label='测试准确率')
plt.legend(loc='lower right')
plt.title('训练与验证准确率')
plt.xlabel(f'训练轮次(生成时间:{current_time.strftime("%Y-%m-%d %H:%M:%S")})') # 横轴标注当前时间
# 子图 2:损失曲线
plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='训练损失')
plt.plot(epochs_range, test_loss, label='测试损失')
plt.legend(loc='upper right')
plt.title('训练与验证损失')
plt.show()

模型评估
python
# 将参数加载到model当中
best_model.load_state_dict(torch.load(PATH, map_location=device))
# 用最佳模型评估测试集
epoch_test_acc, epoch_test_loss = test(test_loader, best_model, loss_fn)
# 输出最佳模型的测试指标
epoch_test_acc, epoch_test_loss

学习总结
-
在 ResNet-50 这样的深层网络中,如果每个残差块都直接用 3×3 卷积处理高维特征图(比如通道数为 256、512 甚至 1024),计算量会非常大。举个具体例子:
-
假设输入是一个 28×28×256 的特征图(H×W×C),我们要用一个普通的 3×3 卷积把它变成 28×28×256(保持尺寸和通道数不变)。
-
普通做法:直接用 256 个 3×3×256 的卷积核。
-
参数量 = 256×3×3×256=589,824
-
-
而 ResNet-50 的 Bottleneck 做法是三步走:
-
1×1 卷积降维:把 256 通道 → 64 通道
参数量 = 64×1×1×256=16,38464×1×1×256=16,384
-
3×3 卷积处理:在低维空间(64 通道)做特征提取
参数量 = 64×3×3×64=36,86464×3×3×64=36,864
-
1×1 卷积升维:把 64 通道 → 256 通道
参数量 = 256×1×1×64=16,384256×1×1×64=16,384
-
总参数量 = 16,384 + 36,864 + 16,384 = 69,632
-
-
那为什么参数量减少而性能没怎么下降?
-
比如:把信息从 256 维压缩到 64 维,会不会丢掉重要特征?
-
其实不会,原因有两点:
- 虽然 1×1 卷积不改变空间尺寸,但它会对每个像素点的所有通道进行线性组合,相当于在通道维度上做了一次"特征重加权"。这一步不仅能降维,还能提炼出更有用的组合特征。
- 核心的非线性变换发生在低维空间
真正提取空间特征的是中间那个 3×3 卷积。而在更低的通道数下做这件事,反而能迫使网络学习更紧凑、更高效的表示------有点像"逼着模型用更少的资源干更多的事"。
-
-
所以,Bottleneck 不是简单地"砍掉信息",而是用更好的方式组织计算:先压缩冗余,再高效处理,最后还原表达能力。这也是为什么 ResNet-50 能做到又深又快,还能在 ImageNet 上取得顶尖性能。而参照代码通过亲手实现
IdentityBlock和ConvBlock,我更清晰地理解了ResNet-50的设计,既省参数又不怎么掉性能。
-