从〇开始深度学习(1)——PyTorch - Python Deep Learning Neural Network API

从〇开始深度学习(1)------PyTorch - Python Deep Learning Neural Network API


  • [从〇开始深度学习(1)------PyTorch - Python Deep Learning Neural Network API](#从〇开始深度学习(1)——PyTorch - Python Deep Learning Neural Network API)
    • <零>写在前面
    • [<壹>Part 1: Tensors and Operations](#<壹>Part 1: Tensors and Operations)
      • [1.Section 1: Introducing PyTorch](#1.Section 1: Introducing PyTorch)
        • [1.1.PyTorch Prerequisites - Neural Network Programming Series](#1.1.PyTorch Prerequisites - Neural Network Programming Series)
        • [1.2.PyTorch Explained - Python Deep Learning Neural Network API](#1.2.PyTorch Explained - Python Deep Learning Neural Network API)
        • [1.3.PyTorch Install - Quick and Easy](#1.3.PyTorch Install - Quick and Easy)
        • [1.4.Cuda Explained - Why Deep Learning Uses GPUs](#1.4.Cuda Explained - Why Deep Learning Uses GPUs)
      • [2.Section 2: Introducing Tensors](#2.Section 2: Introducing Tensors)
        • [2.1.Tensors Explained - Data Structures of Deep Learning](#2.1.Tensors Explained - Data Structures of Deep Learning)
        • [2.2.Rank, Axes, and Shape Explained - Tensors for Deep Learning](#2.2.Rank, Axes, and Shape Explained - Tensors for Deep Learning)
        • [2.3.CNN Tensor Shape Explained - CNNs and Feature Maps](#2.3.CNN Tensor Shape Explained - CNNs and Feature Maps)
        • [2.4.PyTorch Tensors Explained - Neural Network Programming](#2.4.PyTorch Tensors Explained - Neural Network Programming)
          • [(1) Tensor Attributes](#(1) Tensor Attributes)
          • [(2) `torch.dtype`](#(2) torch.dtype)
          • [(3) `torch.device`](#(3) torch.device)
          • [(4) `torch.layout`](#(4) torch.layout)
          • [(5) Creating tensors using data](#(5) Creating tensors using data)
          • [(6) Creation options without data](#(6) Creation options without data)
        • [2.5.Creating PyTorch Tensors - Best Options](#2.5.Creating PyTorch Tensors - Best Options)
          • [(1) The difference between `torch.Tensor` and `torch.tensor`](#(1) The difference between torch.Tensor and torch.tensor)
          • [(2) The difference between `torch.as_tensor` and `torch.from_numpy`](#(2) The difference between torch.as_tensor and torch.from_numpy)
          • [(3) The difference between the first two and the last two](#(3) The difference between the first two and the last two)
      • [3.Section 3: Tensor Operations](#3.Section 3: Tensor Operations)
        • [3.1.Flatten, Reshape, and Squeeze Explained - Tensors for Deep Learning](#3.1.Flatten, Reshape, and Squeeze Explained - Tensors for Deep Learning)
          • [(1) Reshape](#(1) Reshape)
          • [(2) Flatten](#(2) Flatten)
        • [3.2.CNN Flatten Operation Visualized - Tensor Batch Processing](#3.2.CNN Flatten Operation Visualized - Tensor Batch Processing)
        • [3.3.Tensors for Deep Learning - Broadcasting and Element-wise Operations](#3.3.Tensors for Deep Learning - Broadcasting and Element-wise Operations)
          • [(1) Arithmetic operations](#(1) Arithmetic operations)
          • [(*) Broadcasting Tensors](#(*) Broadcasting Tensors)
          • [(2) Comparison Operations](#(2) Comparison Operations)
          • [(3) Some Functions](#(3) Some Functions)
        • [3.4.ArgMax and Reduction Ops - Tensors for Deep Learning](#3.4.ArgMax and Reduction Ops - Tensors for Deep Learning)
          • [(1) Reduction Options](#(1) Reduction Options)
          • [(2) Argmax](#(2) Argmax)
          • [(3) Accessing elements inside tensors](#(3) Accessing elements inside tensors)
    • [<贰>Part 2: Neural Network Training](#<贰>Part 2: Neural Network Training)
      • [1.Section 1: Data and Data Processing](#1.Section 1: Data and Data Processing)
        • [1.1.Importance of Data in Deep Learning - Fashion MNIST for AI](#1.1.Importance of Data in Deep Learning - Fashion MNIST for AI)
        • [1.2.Extract, Transform, Load (ETL) - Deep Learning Data Preparation](#1.2.Extract, Transform, Load (ETL) - Deep Learning Data Preparation)
          • [(1) What is "ETL"](#(1) What is “ETL”)
          • [(2) How to ETL with PyTorch](#(2) How to ETL with PyTorch)
        • [1.3.PyTorch Datasets and DataLoaders - Training Set Exploration](#1.3.PyTorch Datasets and DataLoaders - Training Set Exploration)
          • [(1) PyTorch Dataset: Working with the training set](#(1) PyTorch Dataset: Working with the training set)
          • [(2) PyTorch DataLoader: Working with batches of data](#(2) PyTorch DataLoader: Working with batches of data)
          • [(3) How to Plot Images Using PyTorch DataLoader](#(3) How to Plot Images Using PyTorch DataLoader)
      • [2.Section 2: Neural Networks and PyTorch Design](#2.Section 2: Neural Networks and PyTorch Design)
        • [2.1.Build PyTorch CNN - Object Oriented Neural Networks](#2.1.Build PyTorch CNN - Object Oriented Neural Networks)
          • [(1) Quick object oriented programming review](#(1) Quick object oriented programming review)
          • [(2) Building a neural network in PyTorch](#(2) Building a neural network in PyTorch)
        • [2.2.CNN Layers - Deep Neural Network Architecture](#2.2.CNN Layers - Deep Neural Network Architecture)
          • [(1) Parameter vs Argument](#(1) Parameter vs Argument)
          • [(2) Two types of parameters](#(2) Two types of parameters)
          • [(3) Descriptions of parameters](#(3) Descriptions of parameters)
          • [(4) Kernel vs Filter](#(4) Kernel vs Filter)
        • [2.3.CNN Weights - Learnable Parameters in Neural Networks](#2.3.CNN Weights - Learnable Parameters in Neural Networks)
          • [(1) Another type of parameters](#(1) Another type of parameters)
          • [(2) Getting an Instance the Network](#(2) Getting an Instance the Network)
          • [(3) Accessing the Network's Layers](#(3) Accessing the Network's Layers)
          • [(4) Accessing the Layer Weights](#(4) Accessing the Layer Weights)
        • [2.4.Callable Neural Networks - Linear Layers in Depth](#2.4.Callable Neural Networks - Linear Layers in Depth)
        • [2.5.How to Debug PyTorch Source Code - Debugging Setup](#2.5.How to Debug PyTorch Source Code - Debugging Setup)
        • [2.6.CNN Forward Method - Deep Learning Implementation](#2.6.CNN Forward Method - Deep Learning Implementation)
          • [(1) convolutional layers](#(1) convolutional layers)
          • [(2) linear layers](#(2) linear layers)
        • [2.7.Forward Propagation Explained - Pass Image to PyTorch Neural Network](#2.7.Forward Propagation Explained - Pass Image to PyTorch Neural Network)
        • [2.8.Neural Network Batch Processing - Pass Image Batch to PyTorch CNN](#2.8.Neural Network Batch Processing - Pass Image Batch to PyTorch CNN)
        • [2.9.CNN Output Size Formula - Bonus Neural Network Debugging Session](#2.9.CNN Output Size Formula - Bonus Neural Network Debugging Session)
      • [3.Section 3: Training Neural Networks](#3.Section 3: Training Neural Networks)
        • [3.1.CNN Training - Using a Single Batch](#3.1.CNN Training - Using a Single Batch)
        • [3.2.CNN Training Loop - Using Multiple Epochs](#3.2.CNN Training Loop - Using Multiple Epochs)
        • [3.3.Building a Confusion Matrix - Analyzing Results Part 1](#3.3.Building a Confusion Matrix - Analyzing Results Part 1)
        • [3.4.Stack vs Concat - Deep Learning Tensor Ops](#3.4.Stack vs Concat - Deep Learning Tensor Ops)
        • [3.5.Using TensorBoard with PyTorch - Analyzing Results Part 2](#3.5.Using TensorBoard with PyTorch - Analyzing Results Part 2)
        • [3.6.Hyperparameter Experimenting - Training Neural Networks](#3.6.Hyperparameter Experimenting - Training Neural Networks)
      • [4.Section 4: Neural Network Experimentation](#4.Section 4: Neural Network Experimentation)
        • [4.1.Custom Code - Neural Network Experimentation Code](#4.1.Custom Code - Neural Network Experimentation Code)
        • [4.2.Custom Code - Simultaneous Hyperparameter Testing](#4.2.Custom Code - Simultaneous Hyperparameter Testing)
        • [4.3.Data Loading - Deep Learning Speed Limit Increase](#4.3.Data Loading - Deep Learning Speed Limit Increase)
        • [4.4.On the GPU - Training Neural Networks with CUDA](#4.4.On the GPU - Training Neural Networks with CUDA)
        • [4.5.Data Normalization - Normalize a Dataset](#4.5.Data Normalization - Normalize a Dataset)
        • [4.6.PyTorch DataLoader Source Code - Debugging Session](#4.6.PyTorch DataLoader Source Code - Debugging Session)
        • [4.7.PyTorch Sequential Models - Neural Networks Made Easy](#4.7.PyTorch Sequential Models - Neural Networks Made Easy)
        • [4.8.Batch Norm In PyTorch - Add Normalization To Conv Net Layers](#4.8.Batch Norm In PyTorch - Add Normalization To Conv Net Layers)
    • <叁>后记





回到这篇笔记,我打算把这篇笔记搞成一个完整的课程笔记,从第一节到最后一节贯穿整个系列。课程选择的是DeepLizard的PyTorch教程(相关参考链接放到后面),这套课程总共43节课,个人感觉还是不错的。不过说实话这43节的内容非常多,有一些章节都足够单拎出来写一篇笔记。所以这篇笔记还会有番外篇,如果想看完整课程笔记的话看这篇就完全OK,如果想看某些零碎的知识点可以看相应的番外篇。全系列课程分为了两个大部分,Part 1的代码非常基础,非常简单,大家有兴趣可以自己练一练;Part 2部分的代码我已经全部上传至Github和Gitee,有需要可以自行下载:

LengYuuu/PyTorchStudy_DeepLizard: This repository is used to house code for DeepLizard's PyTorch course. (github.com)

PyTorchStudy_DeepLizard: This repository is used to house code for DeepLizard's PyTorch course. (gitee.com)





PyTorch - Python Deep Learning Neural Network API - YouTube(需科学上网)

1-PyTorch Prerequisites - Syllabus for Neural Network Programming Course_哔哩哔哩_bilibili(B站搬运版,有中文字幕,非常友好)


PyTorch Prerequisites - Syllabus for Neural Network Programming Course - deeplizard(需科学上网)

从零开始的机器学习实践笔记 - 知乎 (zhihu.com)(非常好的一个系列,本文提到的"知乎笔记"如无特别声明,指的就是此篇)


环境 版本号
Windows win11
Pycharm 2024.2.4+
Anaconda 24.11.0
Cuda 12.4.131
Python 3.10.15
PyTorch 2.5.1
torchvision 0.20.1

<壹>Part 1: Tensors and Operations

1.Section 1: Introducing PyTorch

1.1.PyTorch Prerequisites - Neural Network Programming Series


1.2.PyTorch Explained - Python Deep Learning Neural Network API

Torch is based on the Lua programming language. PyTorch is Torch based, but in Python.

Typical packages of PyTorch:

Package Description
torch The top-level PyTorch package and tensor library.
torch.nn "nn" means neural networks. A subpackage that contains classes and modules, like layers, weights and forward functions.
torch.autograd A subpackage that handles the derivative(导数) calculations needed to optimize(优化) the neural network weights.
torch.nn.functional A functional interface that gives us access to functions like loss functions, activation functions, and convolution(卷积) operations.
torch.optim A subpackage that gives us access to typical optimization algorithms(算法) like SGD and Adam.
torch.utils A subpackage that contains utility(多功能,实用) classes like datasets(数据集) and data loaders(数据加载器) that make data preprocessing much easier.
torchvision A separate package that provides access to popular datasets, model architectures, and image transformations for computer vision.
  1. All deep learning frameworks have two features: a tensor library, and a package for computing derivatives. And for PyTorch, these two are "torch" and "torch.autograd".
  2. "torch.nn", "torch.autograd", "torch.nn.functional", "torch.optim", "torch.utils" are subpackages of "torch", while "torchvision" is a separate package.

Using PyTorch makes it available for us to focus more on neural networks and less on the actual framework.

The Philosophy of PyTorch:

  1. Stay out of the way;
  2. Cater to the impatient;
  3. Promote linear code-flow;
  4. Full inter-operation with the Python ecosystem;
  5. Be as fast as anything else.


Another reason why we need to study PyTorch:

To optimize neural networks, we need to calculate derivatives. And to do this computationally, deep learning frameworks use what are called Computational Graphs. Computational Graphs are used to graph the function operations that occur on tensors inside neural networks. These graphs are often used to compute the derivatives needed to optimize the neural networks weights.

PyTorch uses a computational called a Dynamic(动态的) Computational Graph. This means that the graph is generated on the fly(运行中) as the operations occur. This is in contrast to static graphs that are fully determined before the actual actions occur. It just so happens that many of the cutting edge research topics in deep learning are requiring or benefiting greatly from dynamic graphs.

1.3.PyTorch Install - Quick and Easy


1.4.Cuda Explained - Why Deep Learning Uses GPUs

Much of PyTorch is written by Python, however, at bottom of points, Python drops to the C, CPP, and cuda could speed up processing and get their performance boost.


The calculations can be selectively carried out, either on the CPU or on the GPU.

2.Section 2: Introducing Tensors

2.1.Tensors Explained - Data Structures of Deep Learning

In Computer Science, there are: number, array, 2d-array;

In Mathematics, there are: scalar, vector, matrix;

number is same as scalar, array is same as vector, and 2d-array is same as matrix.

Indexes required Computer science Mathematics
0 number scalar
1 array vector
2 2d-array matrix

When more than 2 indexes are required to access specific element, we stop giving specific name to the data structures, and begin using more general language.

Indexes required Computer science Mathematics
n nd-array nd-tensor
2.2.Rank, Axes, and Shape Explained - Tensors for Deep Learning
  • Rank:秩

    • The rank of a tensor refers to the number of dimensions(维度) present within the tensor.

    • A tensor's rank tells us how many indexes(索引) are needed to refer to a specific element within the tensor.

    • e.g.


      We need two indexes to refer to a specific element, so the rank of this tensor is 2.

  • Axes:轴

    • An axis(轴的单数形式) of a tensor is a specific dimension of a tensor.

    • The rank of a tensor tells us how many axes(轴的复数形式) a tensor has.

    • The length of each axis tells us how many indexes are available along each axis.

    • e.g.


      The length of each axis is 3.

  • Shape:形状

    • The shape of a tensor is determined by the length of each axis.
  • Reshape:重塑

    • The shape changes the grouping of the terms but does not change the underlying(跟本的) terms themselves.
    • One thing to notice about reshaping is that the product(乘积) of the component(组成部分) values in the shape must equal the total number of elements in the tensor.
    • Reshaping changes the shape but not the underlying data elements.
    • e.g. A 3 × 3 3×3 3×3 tensor can be reshaped to a 1 × 9 1×9 1×9 tensor.
2.3.CNN Tensor Shape Explained - CNNs and Feature Maps

The shape of a CNN input typically has a length of four. This means that we have a rank-4 tensor with four axes.

Shape: [Batch, Channels, Height, Width]

  • For the input tensor:

    • Batch: The length of this axis tells us how many samples(样品) are in our batch.

    • Channels: This axis represents the color channels. Typical values here are 3 for RGB images or 1 if we are working with grayscale(灰度) images.

    • Height and Width: The image height and width. Possible values here are 28 x 28, or the 224 x 224 image size that is used by VGG16 neural network, or any other image dimensions we can imagine.

  • For the output tensor:

    • Batch: Is same as input tensor.
    • Channels: Can be seen as modified color channels.

N.B. The last axis, which is where we'll start, is where the actual numbers or data values are located.

2.4.PyTorch Tensors Explained - Neural Network Programming

When programming neural networks, data preprocessing is often one of the first steps in the overall process, and one goal of data preprocessing is to transform the raw(原始的) input data into tensor form.

(1) Tensor Attributes

Rank, Axes, and Shape is attributes for all kinds of tensor, but now we want to talk about tensors in PyTorch.

Every torch.Tensor has these attributes:torch.dtype, torch.device, and torch.layout:

  • torch.dtype: The dtype specifies the type of the data that is contained within the tensor.
  • torch.device: This determines where tensor computations for the given tensor will be performed.
  • torch.layout: The layout specifies how the tensor is stored in memory.



python 复制代码
import torch

t = torch.Tensor()



(2) torch.dtype

Tensors contain uniform (of the same type) numerical data with one of these types:

Data type dtype CPU tensor GPU tensor
32-bit floating point torch.float32 torch.FloatTensor torch.cuda.FloatTensor
64-bit floating point torch.float64 torch.DoubleTensor torch.cuda.DoubleTensor
16-bit floating point torch.float16 torch.HalfTensor torch.cuda.HalfTensor
8-bit integer (unsigned) torch.uint8 torch.ByteTensor torch.cuda.ByteTensor
8-bit integer (signed) torch.int8 torch.CharTensor torch.cuda.CharTensor
16-bit integer (signed) torch.int16 torch.ShortTensor torch.cuda.ShortTensor
32-bit integer (signed) torch.int32 torch.IntTensor torch.cuda.IntTensor
64-bit integer (signed) torch.int64 torch.LongTensor torch.cuda.LongTensor

(The table comes from PyTorch Tensors Explained - Neural Network Programming - deeplizard)

N.B. Tensor operations between tensors must happen between tensors with the same type of data.

(3) torch.device

PyTorch supports the use of multiple devices, and they are specified using an index like so:

python 复制代码
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


python 复制代码
device = torch.device('cuda:0')

The complete code is:

python 复制代码
import torch

t = torch.Tensor()


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('cuda:0')

And the result is:


So, if you have multiple GPUs, you can you can specify the GPU by different index.

N.B. Tensor operations between tensors must happen between tensors that exists on the same device.

(4) torch.layout

Just Don't Care.

Reference: Stride of an array - Wikipedia

(5) Creating tensors using data
python 复制代码
import torch
import numpy as np
data = np.array([1,2,3])

o1 = torch.Tensor(data)
o2 = torch.tensor(data)
o3 = torch.as_tensor(data)
o4 = torch.from_numpy(data)


The result is:

tensor([1., 2., 3.])
tensor([1, 2, 3])
tensor([1, 2, 3])
tensor([1, 2, 3])

The first option (o1) has dots after the number indicating that the numbers are floats, while the next three options have a type of int32.

(6) Creation options without data
python 复制代码
import torch
import numpy as np
data = np.array([1,2,3])

o1 = torch.eye(2)
o2 = torch.zeros([2,2])
o3 = torch.ones([2,2])
o4 = torch.rand([2,2])


The result is:

tensor([[1., 0.],
        [0., 1.]])
tensor([[0., 0.],
        [0., 0.]])
tensor([[1., 1.],
        [1., 1.]])
tensor([[0.4056, 0.8623],
        [0.5164, 0.9960]])
2.5.Creating PyTorch Tensors - Best Options
(1) The difference between torch.Tensor and torch.tensor

A quick recap:

python 复制代码
import torch
import numpy as np
data = np.array([1,2,3])

o1 = torch.Tensor(data)
o2 = torch.tensor(data)


The result is:

tensor([1., 2., 3.])
tensor([1, 2, 3])
tensor([1, 2, 3])
tensor([1, 2, 3])

The first option with the uppercase T is the constructor of the torch.Tensor class, and the second option is what we call a factory function that constructs torch.Tensor objects and returns them to the caller. (Factory functions are a software design pattern for creating objects.)

It's fine to use either one. However, the factory function torch.tensor() has better documentation and more configuration options, so it gets the winning spot at the moment.

(2) The difference between torch.as_tensor and torch.from_numpy
python 复制代码
o3 = torch.as_tensor(data)
o4 = torch.from_numpy(data)


The result is:

tensor([1, 2, 3])
tensor([1, 2, 3])

The third and the fourth option are factory functions, too.

The torch.from_numpy() function only accepts numpy.ndarrays, while the torch.as_tensor() function accepts a wide variety of array-like objects including other PyTorch tensors. For this reason, torch.as_tensor() is the winning choice.

(3) The difference between the first two and the last two

If we change the data like this:

python 复制代码
import torch
import numpy as np
data = np.array([1,2,3])

o1 = torch.Tensor(data)
o2 = torch.tensor(data)
o3 = torch.as_tensor(data)
o4 = torch.from_numpy(data)


data[0] = 0
data[1] = 0
data[2] = 0


The result is:

tensor([1., 2., 3.])
tensor([1, 2, 3])
tensor([1, 2, 3])
tensor([1, 2, 3])
tensor([1., 2., 3.])
tensor([1, 2, 3])
tensor([0, 0, 0])
tensor([0, 0, 0])

This happens because torch.Tensor() and torch.tensor() copy their input data while torch.as_tensor() and torch.from_numpy() share their input data in memory with the original input object.

Given all of these details, these two are the best options:

  • torch.tensor()
  • torch.as_tensor()

The torch.tensor() call is the sort of go-to call, while torch.as_tensor() should be employed when tuning our code for performance.


  1. Since numpy.ndarray objects are allocated on the CPU, the as_tensor() function must copy the data from the CPU to the GPU when a GPU is being used.
  2. The memory sharing of as_tensor() doesn't work with built-in Python data structures like lists.
  3. The as_tensor() call requires developer knowledge of the sharing feature. This is necessary so we don't inadvertently make an unwanted change in the underlying data without realizing the change impacts multiple objects.
  4. The as_tensor() performance improvement will be greater if there are a lot of back and forth operations between numpy.ndarray objects and tensor objects. However, if there is just a single load operation, there shouldn't be much impact from a performance perspective.

3.Section 3: Tensor Operations

We have the following high-level categories of operations:

  1. Reshaping operations
  2. Element-wise operations
  3. Reduction operations
  4. Access operations
3.1.Flatten, Reshape, and Squeeze Explained - Tensors for Deep Learning
(1) Reshape


python 复制代码
import torch

t = torch.tensor([
], dtype=torch.float32)






tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])
torch.Size([3, 4])
tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])
tensor([[1., 1., 1., 1., 2., 2.],
        [2., 2., 3., 3., 3., 3.]])
tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])
tensor([[1., 1., 1.],
        [1., 2., 2.],
        [2., 2., 3.],
        [3., 3., 3.]])

N.B. t.reshape doesn't change t itself.


python 复制代码


tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

We can also increase or decrease the rank. For example:


python 复制代码


tensor([[[1., 1., 1.],
         [1., 2., 2.]],

        [[2., 2., 3.],
         [3., 3., 3.]]])
(2) Flatten

A flatten operation on a tensor reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as a 1d-array of elements.

Flattening a tensor means to remove all of the dimensions except for one.

We'll see that flatten operations are required when passing an output tensor from a convolutional layer to a linear layer.

Create a Python function:

python 复制代码
def flatten(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t
  • In PyTorch, the -1 tells the reshape() function to figure out what the value should be based on the number of elements contained within the tensor.
  • Squeezing a tensor removes the dimensions or axes that have a length of one.


python 复制代码


tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])
3.2.CNN Flatten Operation Visualized - Tensor Batch Processing

In past posts, we learned about flattening an entire tensor image. But when working with CNNs, we want to only flatten specific axes within the tensor.



python 复制代码
import torch

t1 = torch.tensor([

t2 = torch.tensor([

t3 = torch.tensor([

t = torch.stack((t1, t2, t3))
t = t.reshape(3,1,4,4)


tensor([[[[1, 1, 1, 1],
          [1, 1, 1, 1],
          [1, 1, 1, 1],
          [1, 1, 1, 1]]],

        [[[2, 2, 2, 2],
          [2, 2, 2, 2],
          [2, 2, 2, 2],
          [2, 2, 2, 2]]],

        [[[3, 3, 3, 3],
          [3, 3, 3, 3],
          [3, 3, 3, 3],
          [3, 3, 3, 3]]]])
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])

Notice in the call how we specified the start_dim parameter. This tells the flatten() method which axis it should start the flatten operation. The start_dim=1 here is an index, so it's the second axis which is the color channel axis. We skip over the batch axis so to speak, leaving it intact(完好无损的).

视频最后还留了一个思考题,如果是RGB图片想保留Color Channels应该怎么展平。把start_dim=1改为start_dim=2即可。

3.3.Tensors for Deep Learning - Broadcasting and Element-wise Operations

An element-wise operation operates on corresponding(相应的) elements between tensors.

Two elements are said to be corresponding if the two elements occupy the same position within the tensor. The position is determined by the indexes used to locate each element.

N.B. Two tensors must have the same shape in order to perform element-wise operations on them.

(1) Arithmetic operations

Arithmetic operations are element-wise operations.


python 复制代码
import torch

t1 = torch.tensor([
], dtype=torch.float32)

t2 = torch.tensor([
], dtype=torch.float32)





tensor([[ 6.,  8.],
        [10., 12.]])
tensor([[-4., -4.],
        [-4., -4.]])
tensor([[ 5., 12.],
        [21., 32.]])
tensor([[0.2000, 0.3333],
        [0.4286, 0.5000]])
tensor([[4., 5.],
        [6., 7.]])
tensor([[-2., -1.],
        [ 0.,  1.]])
tensor([[ 3.,  6.],
        [ 9., 12.]])
tensor([[0.3333, 0.6667],
        [1.0000, 1.3333]])
tensor([[4., 5.],
        [6., 7.]])
tensor([[-2., -1.],
        [ 0.,  1.]])
tensor([[ 3.,  6.],
        [ 9., 12.]])
tensor([[0.3333, 0.6667],
        [1.0000, 1.3333]])
(*) Broadcasting Tensors

Broadcasting is not a so called "option". But we need to know.

Broadcasting describes how tensors with different shapes are treated during element-wise operations. It is the concept whose implementation allows us to add scalars to higher dimensional tensors.


python 复制代码
import torch
import numpy as np

t1 = torch.tensor([
], dtype=torch.float32)

print(np.broadcast_to(3, t1.shape))


[[3 3]
 [3 3]]

This is all under the hood.

(hood: (衣服上的)兜帽,风帽;头巾,面罩;(设备或机器的)防护罩,罩;汽车发动机罩;)

(under the hood: 在表面之下:指在某物的内部工作过程中)

So, t1 + 3 is really this:

python 复制代码
t1 + torch.tensor(
    np.broadcast_to(3, t1.shape)
(2) Comparison Operations

Testing code:

python 复制代码
import torch

print(torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))


tensor([ True, False, False])

Comparison operations are element-wise operations.


python 复制代码
import torch

t = torch.tensor([
], dtype=torch.float32)

print(t.eq(0))          # equal to
print(t.ge(0))          # greater than or equal to
print(t.gt(0))          # greater than
print(t.lt(0))          # less than
print(t.le(7))          # less than or equal to


tensor([[ True, False,  True],
        [False,  True, False],
        [ True, False,  True]])
tensor([[ True, False,  True],
        [ True,  True,  True],
        [ True,  True,  True]])
tensor([[False, False, False],
        [ True, False,  True],
        [False,  True, False]])
tensor([[False,  True, False],
        [False, False, False],
        [False, False, False]])
tensor([[ True,  True,  True],
        [ True,  True,  True],
        [ True, False,  True]])
(3) Some Functions


python 复制代码
import torch

t = torch.tensor([
], dtype=torch.float32)



tensor([[0., 5., 0.],
        [6., 0., 7.],
        [0., 8., 0.]])
tensor([[0.0000,    nan, 0.0000],
        [2.4495, 0.0000, 2.6458],
        [0.0000, 2.8284, 0.0000]])
tensor([[-0.,  5., -0.],
        [-6., -0., -7.],
        [-0., -8., -0.]])
tensor([[0., 5., 0.],
        [6., 0., 7.],
        [0., 8., 0.]])
3.4.ArgMax and Reduction Ops - Tensors for Deep Learning
(1) Reduction Options



python 复制代码
import torch

t = torch.tensor([
], dtype=torch.float32)

print(t.prod())             # product
print(t.mean())             # average
print(t.std())              # standard deviation(标准差)



Here is a question though: Do reduction operations always reduce to a tensor with a single element?

The answer is no!

In fact, we often reduce specific axes at a time. This process is important. It's just like we saw with reshaping when we aimed to flatten the image tensors within a batch while still maintaining the batch axis.



python 复制代码
import torch

t = torch.tensor([
], dtype=torch.float32)



tensor([6., 6., 6., 6.])
tensor([ 4.,  8., 12.])

回顾这句话:reduce specific axes at a time。


dim dim1=0 dim1=1 dim1=2 dim1=3
dim0=0 1 1 1 1
dim0=1 2 2 2 2
dim0=2 3 3 3 3


tensor([ 	4.,
            12.		])


(2) Argmax

Argmax is a mathematical function that tells us which argument, when supplied to a function as input, results in the function's max output value. Argmax returns the index location of the maximum value inside a tensor.


python 复制代码
import torch

t = torch.tensor([
], dtype=torch.float32)



tensor([1., 0., 0., 2., 0., 3., 3., 0., 4., 0., 0., 5.])

If we don't specific an axis to the argmax() method, it returns the index location of the max value from the flattened tensor, which in this case is indeed 11.

Let's see how we can work with specific axes now.


python 复制代码
import torch

t = torch.tensor([
], dtype=torch.float32)




values=tensor([4., 3., 3., 5.]),
indices=tensor([2, 1, 1, 2]))

tensor([2, 1, 1, 2])

values=tensor([2., 3., 5.]),
indices=tensor([3, 1, 3]))

tensor([3, 1, 3])


dim dim1=0 dim1=1 dim1=2 dim1=3
dim0=0 1 0 0 2
dim0=1 0 3 3 0
dim0=2 4 0 0 5


In practice, we often use the argmax() function on a network's output prediction tensor, to determine which category has the highest prediction value.

(3) Accessing elements inside tensors

As for a scalar valued tensor, we use t.item() :


python 复制代码
import torch

t1 = torch.tensor([5], dtype=torch.float32)




As for multiple values, we use t.tolist() or t.numpy() :

python 复制代码
import torch

t2 = torch.tensor([5,6], dtype=torch.float32)


tensor([5., 6.])
[5.0, 6.0]
[5. 6.]

We can access the numeric values by transforming the tensor into a Python list or a NumPy array.

<贰>Part 2: Neural Network Training


LengYuuu/PyTorchStudy_DeepLizard: This repository is used to house code for DeepLizard's PyTorch course. (github.com)

PyTorchStudy_DeepLizard: This repository is used to house code for DeepLizard's PyTorch course. (gitee.com)

The project (Bird's-eye view)

There are four general steps that we'll be following as we move through this project:

  1. Prepare the data(Section 1)
  2. Build the model(Section 2)
  3. Train the model(Section 3)
  4. Analyze the model's results(Section 4)

Personal Suggestion: In this part, you need to write a lot and read a lot. I will write down all the code, you can copy it directly into your project of course, but remember to read it carefully, think about it, and run the program yourself.

1.Section 1: Data and Data Processing

Bird's eye view of the process

From a high-level perspective or bird's eye view of our deep learning project, we prepared our data, and now, we are ready to build our model.

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model's results
1.1.Importance of Data in Deep Learning - Fashion MNIST for AI

介绍了一个数据集:Fashion MNIST

不知道是不是广子,大致意思就是传统的MNIST,即手写数字数据集,太简单没新意;所以弄了个Fashion MNIST。

MNIST是分类10个数字,Fashion MNIST是分类10种不同的衣服。

Index Label
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

数据集链接:zalandoresearch/fashion-mnist: A MNIST-like fashion product database. Benchmark (github.com)

官方文档:1708.07747 (arxiv.org)

1.2.Extract, Transform, Load (ETL) - Deep Learning Data Preparation
(1) What is "ETL"

To prepare our data, we'll be following what is loosely known as an ETL process.

  • Extract data from a data source.
  • Transform data into a desirable format.
  • Load data into a suitable structure.

Our ultimate goal when preparing our data is to do the following (ETL):

  1. Extract -- Get the Fashion-MNIST image data from the source.
  2. Transform -- Put our data into tensor form.
  3. Load -- Put our data into an object to make it easily accessible.
(2) How to ETL with PyTorch




python 复制代码
import torchvision
import torchvision.transforms as transforms

train_set = torchvision.datasets.FashionMNIST(





Parameter Description
root The location on disk where the data is located.
train If the dataset is the training set
download If the data should be downloaded.
transform A composition(组合) of transformations that should be performed on the dataset elements. 应在数据集元素上执行的转换的组合。




train_set = torchvision.datasets.FashionMNIST(



先看第一行代码:train_set = torchvision.datasets.FashionMNIST()


python 复制代码
from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip


python 复制代码
class FashionMNIST(MNIST):
    """`Fashion-MNIST <https://github.com/zalandoresearch/fashion-mnist>`_ Dataset.

        root (str or ``pathlib.Path``): Root directory of dataset where ``FashionMNIST/raw/train-images-idx3-ubyte``
            and  ``FashionMNIST/raw/t10k-images-idx3-ubyte`` exist.
        train (bool, optional): If True, creates dataset from ``train-images-idx3-ubyte``,
            otherwise from ``t10k-images-idx3-ubyte``.
        download (bool, optional): If True, downloads the dataset from the internet and
            puts it in root directory. If dataset is already downloaded, it is not
            downloaded again.
        transform (callable, optional): A function/transform that  takes in a PIL image
            and returns a transformed version. E.g, ``transforms.RandomCrop``
        target_transform (callable, optional): A function/transform that takes in the
            target and transforms it.

    mirrors = ["http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/"]

    resources = [
        ("train-images-idx3-ubyte.gz", "8d4fb7e6c68d591d4c3dfef9ec88bf0d"),
        ("train-labels-idx1-ubyte.gz", "25c81989df183df01b3e8a0aad5dffbe"),
        ("t10k-images-idx3-ubyte.gz", "bef4ecab320f06d8554ea6380940ec79"),
        ("t10k-labels-idx1-ubyte.gz", "bb300cfdad3c16e7a12a480ee83cd310"),
    classes = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]



python 复制代码
class MNIST(VisionDataset):
    """`MNIST <http://yann.lecun.com/exdb/mnist/>`_ Dataset.

        root (str or ``pathlib.Path``): Root directory of dataset where ``MNIST/raw/train-images-idx3-ubyte``
            and  ``MNIST/raw/t10k-images-idx3-ubyte`` exist.
        train (bool, optional): If True, creates dataset from ``train-images-idx3-ubyte``,
            otherwise from ``t10k-images-idx3-ubyte``.
        download (bool, optional): If True, downloads the dataset from the internet and
            puts it in root directory. If dataset is already downloaded, it is not
            downloaded again.
        transform (callable, optional): A function/transform that  takes in a PIL image
            and returns a transformed version. E.g, ``transforms.RandomCrop``
        target_transform (callable, optional): A function/transform that takes in the
            target and transforms it.

    mirrors = [

    resources = [
        ("train-images-idx3-ubyte.gz", "f68b3c2dcbeaaa9fbdd348bbdeb94873"),
        ("train-labels-idx1-ubyte.gz", "d53e105ee54ea40749a09fcbcd1e9432"),
        ("t10k-images-idx3-ubyte.gz", "9fb629c4189551a2d022fa330f9573f3"),
        ("t10k-labels-idx1-ubyte.gz", "ec29112dd5afa0611ce80d1b7f02629c"),

    training_file = "training.pt"
    test_file = "test.pt"
    classes = [
        "0 - zero",
        "1 - one",
        "2 - two",
        "3 - three",
        "4 - four",
        "5 - five",
        "6 - six",
        "7 - seven",
        "8 - eight",
        "9 - nine",

    # ....太多了,不粘贴浪费地方

    def __init__(
        root: Union[str, Path],
        train: bool = True,
        transform: Optional[Callable] = None,
        target_transform: Optional[Callable] = None,
        download: bool = False,
    ) -> None:
        super().__init__(root, transform=transform, target_transform=target_transform)
        self.train = train  # training set or test set

        if self._check_legacy_exist():
            self.data, self.targets = self._load_legacy_data()

        if download:

        if not self._check_exists():
            raise RuntimeError("Dataset not found. You can use download=True to download it")

        self.data, self.targets = self._load_data()

    def _check_legacy_exist(self):
        processed_folder_exists = os.path.exists(self.processed_folder)
        if not processed_folder_exists:
            return False

        return all(
            check_integrity(os.path.join(self.processed_folder, file)) for file in (self.training_file, self.test_file)

    def _load_legacy_data(self):
        # This is for BC only. We no longer cache the data in a custom binary, but simply read from the raw data
        # directly.
        data_file = self.training_file if self.train else self.test_file
        return torch.load(os.path.join(self.processed_folder, data_file), weights_only=True)

    def _load_data(self):
        image_file = f"{'train' if self.train else 't10k'}-images-idx3-ubyte"
        data = read_image_file(os.path.join(self.raw_folder, image_file))

        label_file = f"{'train' if self.train else 't10k'}-labels-idx1-ubyte"
        targets = read_label_file(os.path.join(self.raw_folder, label_file))

        return data, targets

    def __getitem__(self, index: int) -> Tuple[Any, Any]:
            index (int): Index

            tuple: (image, target) where target is index of the target class.
        img, target = self.data[index], int(self.targets[index])

        # doing this so that it is consistent with all other datasets
        # to return a PIL Image
        img = Image.fromarray(img.numpy(), mode="L")

        if self.transform is not None:
            img = self.transform(img)

        if self.target_transform is not None:
            target = self.target_transform(target)

        return img, target

    def __len__(self) -> int:
        return len(self.data)


python 复制代码
train_set = torchvision.datasets.FashionMNIST(


python 复制代码
student1 = Students('Jim', 18)



python 复制代码
train_set = torchvision.datasets.FashionMNIST(



transform: A composition(组合) of transformations that should be performed on the dataset elements. (应在数据集元素上执行的转换的组合)


现在唯一可能有疑惑的地方就是为什么有个中括号"[]"。其实,仔细看关于transform的描述,可以发现他说的是A composition(组合) of transformations,注意最后的s,即他可能有多种转换,只不过这个地方只是ToTensor。这里的中括号固然多余,只有在需要多种变换操作的时候才有实际作用,这里的作用只是统一书写习惯。


python 复制代码
import torch
import torchvision
import torchvision.transforms as transforms

train_set = torchvision.datasets.FashionMNIST(

train_loader = torch.utils.data.DataLoader(train_set



Parameter Description
batch_size How many samples per batch to load.
shuffle Set to True to have the data reshuffled(重新洗牌) at every epoch(轮).
num_workers How many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)



1.3.PyTorch Datasets and DataLoaders - Training Set Exploration
(1) PyTorch Dataset: Working with the training set
  1. Typical functions:

    python 复制代码
    import torch
    import torchvision
    import torchvision.transforms as transforms
    train_set = torchvision.datasets.FashionMNIST(
    train_loader = torch.utils.data.DataLoader(train_set


    tensor([9, 0, 0,  ..., 3, 0, 5])
    tensor([6000, 6000, 6000, 6000, 6000, 6000, 6000, 6000, 6000, 6000])
    • To see how many images are in our training set, we can check the length of the dataset using the Python len() function;
    • To see the labels for each image, we can use the train_set.targets function;
    • If we want to see how many of each label exists in the dataset, we can use the PyTorch bincount() function;
  2. Class imbalance

    Class imbalance is a common problem, but in our case, we have just seen that the Fashion-MNIST dataset is indeed balanced, so we need not worry about that for our project.

  3. Accessing data in the training set

    If we want to access single data in the training set:

    python 复制代码
    sample = next(iter(train_set))
    print('len:', len(sample))
    image, label = sample
    print('types:', type(image), type(label))
    print('shape:', image.shape)


    len: 2
    types: <class 'torch.Tensor'> <class 'int'>
    shape: torch.Size([1, 28, 28])

    The code image, label = sample is equal to image = sample[0], label = sample[1].

    We don't have to worry too much about how next and iter work.

    If we want to show it on the screen:

    python 复制代码
    import torch
    import torchvision
    import torchvision.transforms as transforms
    import matplotlib.pyplot as plt
    # import numpy as np
    train_set = torchvision.datasets.FashionMNIST(
        , train=True
        , download=True
        , transform=transforms.Compose([
    train_loader = torch.utils.data.DataLoader(
        train_set, batch_size=10
    sample = next(iter(train_set))
    print('len:', len(sample))
    image, label = sample
    print('types:', type(image), type(label))
    print('shape:', image.shape)
    print('label:', label)
    plt.imshow(image.squeeze(), cmap="gray")

    We need to import: import matplotlib.pyplot as plt.


    len: 2
    types: <class 'torch.Tensor'> <class 'int'>
    shape: torch.Size([1, 28, 28])
    label: 9
(2) PyTorch DataLoader: Working with batches of data

Unlike the code we just wrote:

python 复制代码
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

display_loader = torch.utils.data.DataLoader(
    train_set, batch_size=10

batch = next(iter(display_loader))
print('len:', len(batch))

images, labels = batch
print('types:', type(images), type(labels))
print('shapes:', images.shape, labels.shape)

print('labels:', labels)

grid = torchvision.utils.make_grid(images, nrow=5)
plt.imshow(np.transpose(grid, (1, 2, 0)))


len: 2
types: <class 'torch.Tensor'> <class 'torch.Tensor'>
shapes: torch.Size([10, 1, 28, 28]) torch.Size([10])
labels: tensor([9, 0, 0, 3, 0, 2, 7, 2, 5, 5])


(3) How to Plot Images Using PyTorch DataLoader


python 复制代码
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

how_many_to_plot = 20

train_loader = torch.utils.data.DataLoader(
    train_set, batch_size=1, shuffle=True

for i, batch in enumerate(train_loader, start=1):
    image, label = batch
    plt.imshow(image.reshape(28,28), cmap='gray')
    plt.title(train_set.classes[label.item()], fontsize=28)
    if i >= how_many_to_plot: break

2.Section 2: Neural Networks and PyTorch Design

Bird's eye view of the process

From a high-level perspective or bird's eye view of our deep learning project, we prepared our data, and now, we are ready to build our model.

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model's results

When say model , we mean our network . The words model and network mean the same thing. What we want our network to ultimately do is model or approximate a function that maps image inputs to the correct output class.

原作提到建议看deep learning fundamentals series这个系列课程作为入门,如果不看这个系列的全部课程也至少要看这5节:

If you just want a crash course on CNNs, these are the specific posts to see:


贴一下全系列课程视频的网址:Deep Learning playlist overview & Machine Learning intro (youtube.com)


2.1.Build PyTorch CNN - Object Oriented Neural Networks
(1) Quick object oriented programming review

I recommend watching the explanation in the video: 17-Build PyTorch CNN - Object Oriented Neural Networks_哔哩哔哩_bilibili(from 01:44 to 09:30)

And the note: 囫囵吞枣学Python(1)------类-CSDN博客

(2) Building a neural network in PyTorch

Today we need to understand two words: layer and forward method.

So, first: What is layer?

  • layer:
    • a transformation(using code)
    • a collection of weights(using data)

Layers in PyTorch are defined by classes, so in code, our layers will be objects.(In the note: 囫囵吞枣学Python(1)------类-CSDN博客, the Students is defined by class, so the student1 is an object. If you don't understand, I suggest you review the first part: Quick object oriented programming review)

Second: What is forward method?

When we pass a tensor to our network as input, the tensor flows forward though each layer transformation until the tensor reaches the output layer. This process of a tensor flowing forward though the network is known as a forward pass.

The package torch.nn includes large number of classes and methods that we can use them directly.

First, Let's create a simple class to represent a neural network:

python 复制代码
class Network:
    def __init__(self):
        self.layer = None

    def forward(self, t):
        t = self.layer(t)
        return t

Second, Make our Network class extend nn.Module:

python 复制代码
import torch.nn as nn

class Network(nn.Module): # line 1
    def __init__(self):
        super().__init__() # line 3
        self.layer = None

    def forward(self, t):
        t = self.layer(t)
        return t

Both of these two parts of code have a characteristic commonly: the layer is empty. Now let's replace the None with some real layers which we will often use:

python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t
  • Conv2d: convolutional layers;

  • Linear: linear layers; linear layers are also called fully connected layers , and they also have a third name that we may hear sometimes called dense; so linear, dense, and fully connected are all ways to refer to the same type of layer;

  • We used the name out for the last linear layer because the last layer in the network is the output layer;

That is the end of this post, it's perfectly normal if you don't understand this part of code thoroughly. Don't worry about that. Just continue to learn with this doubt in mind, and you will gradually understand it.

2.2.CNN Layers - Deep Neural Network Architecture

Our goal in this post is to better understand the layers we have defined. To do this, we're going to learn about the parameters and the values that we passed for these parameters in the layer constructors.

(1) Parameter vs Argument

Parameters are used in function definitions. For this reason, we can think of parameters as place-holders.

Arguments are the actual values that are passed to the function when the function is called.

In our Network's case, the names like in_channels and out_channels are the parameters, and the values that we have specified like 1 and 6 are the arguments.

python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t
(2) Two types of parameters
  • Hyperparameters

  • Data dependent hyperparameters

In fact, a lot of terms in deep learning are used loosely(宽松地), and the word parameter is one of them. Try not to let it throw you off(使你困惑或分心,使你偏离正确的方向或计划).

In other words, these terms are not as important as you imagine.

  1. Hyperparameters

    Hyperparameters are parameters whose values are chosen manually(手动地) and arbitrarily(随意地). As neural network programmers, we choose hyperparameter values mainly based on trial(试验) and error and increasingly by utilizing(利用) values that have proven to work well in the past.

    Talk like a human being, we usually test and tune(调整) these parameters to find values that work best.

    In our Network's case, the parameters kernel_size, out_channels and out_features are hyperparameters(with a exception, the last out_features isn't a hyperparameter).

    Parameter Description
    kernel_size Sets the height and width of the filter.
    out_channels Sets depth of the filter. This is the number of kernels inside the filter. One kernel produces one output channel.
    out_features Sets the size of the output tensor.

    One pattern that shows up quite often is that we increase our out_channels as we add additional convolutional layers, and after we switch to linear layers we shrink our out_features as we filter down to our number of output classes. We'll dive deeper into this in the next post.

  2. Data dependent hyperparameters

    Data dependent hyperparameters are parameters whose values are dependent on data.

    Two typical parameters are the in_channels of the first convolutional layer, and the out_features of the output layer. The in_channels of the first convolutional layer depend on the number of color channels present inside the images that make up the training set. Since we are dealing with grayscale images, we know that this value should be a 1. The out_features for the output layer depend on the number of classes that are present inside our training set. Since we have 10 classes of clothing inside the Fashion-MNIST dataset, we know that we need 10 output features.

    In general, the input to one layer is the output from the previous layer, and so all of the in_channels in the convolutional layers and in_features in the linear layers depend on the data coming from the previous layer.

    Why we have 12*4*4? The 12 comes from the number of output channels in the previous layer, but why do we have the two 4s? We cover how we get these values in a future post.

(3) Descriptions of parameters
Layer Param name Param value The param value is
conv1 in_channels 1 the number of color channels in the input image.
conv1 kernel_size 5 a hyperparameter.
conv1 out_channels 6 a hyperparameter.
conv2 in_channels 6 the number of out_channels in previous layer.
conv2 kernel_size 5 a hyperparameter.
conv2 out_channels 12 a hyperparameter (higher than previous conv layer).
fc1 in_features 1244 the length of the flattened output from previous layer.
fc1 out_features 120 a hyperparameter.
fc2 in_features 120 the number of out_features of previous layer.
fc2 out_features 60 a hyperparameter (lower than previous linear layer).
out in_features 60 the number of out_channels in previous layer.
out out_features 10 the number of prediction classes.
(4) Kernel vs Filter

We often use the words filter and kernel interchangeably(交替地) in deep learning. However, there is a technical distinction between these two concepts.

A kernel is a 2D tensor, and a filter is a 3D tensor that contains a collection of kernels. We apply a kernel to a single channel, and we apply a filter to multiple channels.

Reference: neural networks - Difference between "kernel" and "filter" in CNN - Cross Validated (stackexchange.com)

2.3.CNN Weights - Learnable Parameters in Neural Networks
(1) Another type of parameters
  • Learnable parameters

    Learnable parameters are parameters whose values are learned during the training process. With learnable parameters, we typically start out with a set of arbitrary(任意的) values, and these values then get updated in an iterative(迭代的) fashion(方式) as the network learns.

    In fact, when we say that a network is learning, we specifically mean that the network is learning the appropriate values for the learnable parameters.

(2) Getting an Instance the Network
python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t




  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)

For the convolutional layers, the kernel_size argument is a Python tuple (5,5) even though we only passed the number 5 in the constructor. This is because our filters actually have a height and width, and when we pass a single number, the code inside the layer's constructor assumes that we want a square filter.

The stride is an additional parameter that we could have set, but we left it out. When the stride is not specified in the layer constructor the layer automatically sets it. The stride tells the conv layer how far the filter should slide after each operation in the overall convolution. This tuple says to slide by one unit when moving to the right and also by one unit when moving down.

For the linear layers, we have an additional parameter called bias which has a default parameter value of true. It is possible to turn this off by setting it to false.

In the video, the author also mentioned the word 'override'. We call it '重写' in Chinese. It's not important to focus on it here, so we don't have to worry about it.

(3) Accessing the Network's Layers
python 复制代码


Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
Linear(in_features=192, out_features=120, bias=True)
Linear(in_features=120, out_features=60, bias=True)
Linear(in_features=60, out_features=10, bias=True)
(4) Accessing the Layer Weights

Let's first look at some examples.

First, convolutional layers:

python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t




Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

Parameter containing:
tensor([[[[ 0.1232,  0.1745, -0.0915,  0.0615,  0.1538],
          [-0.0747, -0.0346,  0.0290, -0.0959,  0.0164],
          [ 0.0145, -0.0813, -0.1848, -0.1106, -0.1396],
          [-0.1269, -0.0738, -0.0959, -0.1527,  0.0644],
          [ 0.1800, -0.0883, -0.0080,  0.1344,  0.0920]]],

        [[[-0.0629,  0.1750, -0.1389,  0.1275, -0.1797],
          [-0.1755,  0.1946, -0.1925,  0.0654, -0.1339],
          [-0.1237, -0.1942, -0.1812, -0.1883,  0.1600],
          [ 0.1417,  0.1051,  0.1502, -0.1608, -0.1157],
          [ 0.0644,  0.1915, -0.1855,  0.1809, -0.0025]]],

        [[[ 0.1701, -0.0435, -0.1149, -0.0337,  0.0830],
          [ 0.0006,  0.0686,  0.1429, -0.1244, -0.0048],
          [ 0.0632, -0.1001,  0.1045, -0.1651,  0.1013],
          [ 0.1934,  0.1950, -0.0350,  0.0422, -0.0931],
          [-0.1226, -0.1583,  0.1330,  0.1100, -0.1544]]],

        [[[-0.0572, -0.0689,  0.1695,  0.0712,  0.0893],
          [ 0.1183, -0.0032, -0.0855,  0.0300,  0.0392],
          [-0.1271, -0.0850, -0.1440, -0.0717,  0.1915],
          [-0.0673, -0.1499,  0.0396,  0.1853, -0.1650],
          [ 0.1341, -0.1745, -0.1512,  0.1500, -0.1642]]],

        [[[-0.0190,  0.0146, -0.1059, -0.0617, -0.0630],
          [ 0.0148, -0.1553,  0.0026,  0.1763,  0.0672],
          [-0.1689,  0.1345,  0.1268,  0.1737,  0.1519],
          [ 0.1675, -0.0937, -0.0181,  0.0267, -0.0231],
          [-0.1085,  0.0345, -0.0552,  0.0690, -0.0950]]],

        [[[ 0.0343, -0.1318,  0.0569,  0.1160, -0.1973],
          [-0.0326,  0.1682,  0.1729, -0.0455, -0.0761],
          [-0.0124,  0.1356,  0.1893, -0.0778,  0.0509],
          [-0.1544, -0.0527,  0.1602,  0.1525,  0.0864],
          [ 0.0832,  0.1645,  0.1838,  0.1726, -0.1858]]]], requires_grad=True)

torch.Size([6, 1, 5, 5])

If we make some changes to the code:

python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t




Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))

Parameter containing:
tensor([[[[-4.2884e-02, -1.8646e-02, -3.6468e-02, -2.1878e-02, -6.1812e-02],
          [ 2.9104e-02,  4.3656e-02,  2.1134e-02,  6.8243e-02,  3.9659e-02],
          [-4.8924e-02,  5.9818e-02, -3.0731e-02,  3.9902e-02, -9.6543e-03],
          [ 4.2226e-03,  7.7117e-02, -5.0710e-02,  7.5835e-02, -7.0011e-02],
          [ 2.4738e-02, -7.1612e-03, -6.5956e-02, -7.1910e-02, -4.0692e-03]],

         [[ 5.9244e-02, -5.4084e-02, -4.1429e-02, -5.3655e-02, -2.9016e-02],
          [ 5.7895e-02,  1.6712e-02, -5.7220e-02,  2.0745e-02,  7.3740e-02],
          [ 6.4129e-02,  4.3146e-02,  2.0793e-02,  4.8607e-02, -1.8870e-02],
          [-1.8324e-02, -6.2051e-02, -4.5263e-02,  3.0059e-02,  2.4538e-02],
          [ 2.5017e-02, -6.1615e-02, -1.4608e-02, -2.3294e-02, -1.0028e-02]],

         [[ 3.9857e-02, -6.9648e-02, -4.9927e-02,  7.9932e-03, -6.4465e-02],
          [-3.1335e-02,  4.7432e-02,  1.8392e-02, -9.7926e-03,  7.6205e-02],
          [ 5.1769e-02, -3.8508e-02,  2.1279e-02,  5.8801e-02, -7.6870e-02],
          [ 6.5906e-02, -6.5944e-02,  6.4801e-02, -5.0759e-02, -2.9017e-02],
          [ 5.1388e-02,  3.3068e-02,  5.1049e-02,  8.1391e-02,  5.6871e-02]],

         [[ 7.6068e-03,  5.7764e-02,  1.3304e-02,  2.3320e-02,  7.1435e-02],
          [ 6.1237e-02,  2.0400e-02,  2.8379e-05,  7.6489e-02,  7.2457e-02],
          [ 9.6467e-03, -1.4250e-02, -7.3180e-02, -2.4022e-02, -2.0675e-02],
          [-5.6530e-02, -4.8809e-03,  2.8938e-02,  7.1006e-02, -4.4209e-02],
          [-2.6500e-02, -3.5677e-03,  6.7954e-02, -3.1715e-02,  5.1770e-02]],

         [[ 1.3207e-02,  3.0945e-02, -7.3218e-02,  5.3696e-02, -5.5415e-02],
          [ 6.4929e-02, -3.0792e-02, -2.1799e-02,  4.3814e-02,  6.4807e-02],
          [-1.4082e-02, -1.2352e-02, -4.1357e-02, -5.0738e-02, -1.2696e-02],
          [ 2.3784e-02,  4.4909e-02, -5.8380e-02,  6.7909e-02, -8.2366e-03],
          [-7.9928e-02,  3.4381e-02, -5.9752e-02, -7.8087e-02,  2.9481e-02]],

         [[ 2.8638e-02, -6.7411e-02,  4.7579e-02,  1.0333e-02,  6.7232e-02],
          [ 4.3504e-02,  5.4487e-02,  5.1175e-02, -6.6485e-03,  6.6359e-02],
          [ 3.0006e-02,  6.1103e-02, -2.9882e-02, -6.9170e-02, -3.3795e-02],
          [ 5.4645e-02,  5.9930e-02,  7.2578e-02, -3.9443e-02,  5.6268e-02],
          [-6.5664e-04, -3.5357e-02,  6.3044e-02,  2.8497e-02, -4.8495e-02]]],

        [[[-1.5204e-02, -2.0982e-04,  5.2414e-02,  6.6475e-02, -3.9259e-02],
          [ 1.8214e-02,  4.8985e-02, -1.5981e-02, -3.1356e-02, -7.6915e-03],
          [-1.8750e-02, -2.3607e-02, -2.1833e-02,  7.7038e-02, -4.7328e-02],
          [-4.3814e-02, -4.0106e-02,  3.3002e-02,  7.4004e-02,  7.5722e-02],
          [ 3.9917e-02, -3.7348e-02, -6.0048e-02,  2.1473e-02, -5.3794e-02]],

         [[-6.9337e-02, -6.3384e-02, -7.8037e-02,  2.7754e-02,  4.9844e-02],
          [-1.8384e-02,  8.1333e-02,  8.1422e-02, -2.4105e-03, -6.8615e-02],
          [-4.6947e-02, -5.5351e-02,  4.5957e-02,  2.9277e-02, -1.3860e-02],
          [-4.6693e-02,  2.8683e-02,  6.1394e-02,  8.0850e-02, -5.0913e-02],
          [ 5.7079e-02,  6.9726e-02, -7.1289e-03,  6.4266e-04,  5.3523e-02]],

         [[-2.6457e-03, -2.0547e-02, -7.2038e-02, -2.7709e-02, -6.1851e-02],
          [-4.1947e-02,  3.4758e-02,  2.2606e-02, -5.2566e-02,  8.0384e-02],
          [-3.1233e-02, -4.4558e-02,  8.4542e-03,  3.6306e-02, -8.1493e-02],
          [ 1.7083e-02,  8.3195e-04, -7.9616e-02,  2.9549e-03, -5.3943e-02],
          [ 3.2066e-02,  7.3952e-03,  1.1623e-02, -5.5744e-02,  1.7965e-02]],

         [[ 5.3836e-02, -6.7208e-02, -6.1651e-02, -1.8709e-02,  5.6753e-02],
          [ 8.2612e-03, -3.2186e-02, -2.6628e-02, -4.6597e-02, -7.2020e-02],
          [-4.3285e-03,  1.1460e-02, -3.0413e-02,  5.2102e-02, -7.5177e-02],
          [-7.1347e-02,  5.5588e-02,  6.9111e-02,  1.4323e-02, -5.7546e-02],
          [-1.8687e-02, -7.7605e-02, -8.0353e-02, -3.2596e-02, -3.1418e-02]],

         [[-6.9735e-02,  5.3523e-02, -5.3416e-02,  4.5771e-02,  5.6954e-02],
          [ 6.3120e-02,  5.5763e-02,  1.6067e-02,  8.6567e-03, -4.2644e-02],
          [-5.9344e-02, -4.6653e-03,  7.6593e-02, -7.3292e-02, -5.3917e-02],
          [ 6.6566e-02,  3.1131e-02,  7.7349e-02, -2.7129e-02,  8.4518e-03],
          [ 1.6985e-02, -7.0555e-02,  3.8170e-02, -1.1612e-02, -7.6542e-02]],

         [[-6.0196e-02, -6.4580e-02, -3.1164e-03, -4.1933e-02,  7.1420e-03],
          [ 7.9487e-02, -1.4821e-02,  2.2844e-03, -1.5251e-03,  3.7557e-02],
          [-6.8896e-02, -6.6881e-03,  5.7520e-02,  1.8301e-02, -3.7004e-02],
          [-2.5592e-02,  3.0609e-02,  6.9578e-02,  5.5549e-02, -5.3245e-02],
          [ 4.6727e-02,  8.0116e-02, -7.5505e-02, -2.8765e-02, -3.5874e-02]]],

        [[[ 5.3670e-02,  7.4484e-02, -6.3226e-02, -2.9761e-02,  6.0873e-02],
          [-6.1811e-02,  1.6729e-02,  4.5729e-02,  2.8226e-04, -1.3171e-02],
          [-1.2364e-02,  7.2936e-02,  1.0765e-02,  3.1374e-02,  1.7582e-02],
          [ 1.3305e-02, -6.6938e-02, -6.6351e-02,  4.8234e-02,  2.5997e-02],
          [-4.1954e-03, -6.4869e-02,  1.7950e-02,  3.3482e-02, -1.2225e-02]],

         [[-1.3839e-02,  3.9010e-02, -6.4779e-02, -7.0044e-02, -2.7837e-02],
          [ 5.8636e-02,  7.5278e-02,  7.1607e-02,  5.5469e-02,  5.4468e-02],
          [-7.4318e-02,  7.4283e-03, -2.3738e-02,  6.4434e-02,  1.9524e-02],
          [ 7.8238e-02,  5.3939e-02, -4.3555e-02,  6.3559e-02,  8.1849e-04],
          [-5.5254e-02, -6.8373e-02,  5.0078e-02,  2.6748e-02,  4.5676e-02]],

         [[ 7.1268e-02,  3.6513e-03,  2.4753e-02,  3.5536e-02,  2.0245e-02],
          [ 5.9411e-02,  6.5015e-02,  6.7302e-02,  5.5706e-02, -7.3357e-02],
          [ 6.7356e-02,  2.8092e-02,  8.0472e-02,  6.9567e-02, -3.6824e-02],
          [-4.3014e-02, -7.9176e-02,  5.1021e-02, -3.2842e-02,  4.3498e-02],
          [-3.7790e-03,  3.6721e-03,  3.3520e-02,  3.8218e-02, -6.1545e-02]],

         [[-4.7160e-02, -2.7351e-02, -6.6609e-02, -6.4513e-03,  3.4438e-02],
          [-4.4675e-03, -6.5095e-02,  6.1610e-02,  8.0325e-02,  5.7229e-02],
          [-1.3750e-02, -1.0938e-02, -4.5011e-02,  6.9686e-02,  5.1559e-02],
          [ 5.8902e-02, -1.1045e-02, -4.3365e-02, -2.8516e-04,  7.0693e-02],
          [ 5.4149e-02, -3.4944e-02, -4.5348e-02, -4.7880e-02, -6.0826e-02]],

         [[ 8.4448e-03,  9.3816e-03,  3.4866e-02,  3.8719e-04, -4.4713e-02],
          [-3.7519e-02,  1.4705e-02,  3.1401e-02,  6.1778e-02, -2.9698e-02],
          [ 2.0491e-02, -1.3609e-02,  6.7055e-02,  4.1654e-02, -3.3637e-02],
          [-6.8364e-02, -5.5866e-02, -7.7622e-02,  1.5276e-02, -3.5520e-02],
          [ 3.1254e-02, -7.0029e-02,  4.4888e-02,  5.9723e-02,  3.6382e-02]],

         [[-2.3066e-02,  6.7966e-02,  5.1811e-02, -5.9159e-02,  3.1069e-02],
          [ 5.1321e-02,  6.7464e-02,  7.5866e-02, -5.5414e-02,  4.0420e-02],
          [-4.4354e-02,  7.9306e-02, -1.4644e-02, -9.3875e-03,  2.6070e-02],
          [ 7.4850e-02,  6.0425e-02, -2.0909e-02,  5.9285e-02,  4.6566e-02],
          [ 1.3756e-02,  1.1649e-02,  2.8588e-02, -1.3022e-03,  2.3256e-02]]],


        [[[ 7.6311e-02,  7.5261e-02,  5.7561e-02, -4.3356e-02, -7.2909e-02],
          [-4.6708e-02,  5.1551e-02, -4.8101e-02, -5.1413e-02, -3.6152e-02],
          [ 6.0626e-02,  5.6325e-04, -2.1743e-02,  3.3400e-02,  7.9141e-02],
          [ 3.5604e-02,  6.9508e-02, -2.1984e-02,  5.9585e-02, -9.4945e-03],
          [-4.0188e-02, -5.4732e-03,  6.8583e-02,  3.0551e-02, -1.1802e-02]],

         [[ 5.2653e-02,  6.1670e-02,  2.6309e-03,  5.3165e-02, -3.8166e-02],
          [ 8.1076e-02,  5.7386e-02, -6.1753e-02,  4.9131e-02,  3.6430e-02],
          [-4.7202e-02, -4.2705e-02, -4.6333e-02,  6.7525e-02,  4.8547e-02],
          [ 7.9853e-02, -8.0172e-02,  4.9944e-03, -5.7631e-02, -6.1132e-03],
          [-5.5667e-03, -2.1569e-02, -2.8139e-02, -8.9662e-03,  3.5014e-02]],

         [[-3.0472e-02,  3.0703e-02,  5.1314e-02, -6.2341e-02, -6.3894e-02],
          [-7.6258e-02,  1.6020e-02,  3.3108e-02, -6.8395e-02, -5.7936e-02],
          [ 6.3067e-02,  6.1881e-04, -8.0695e-02, -1.7197e-02,  2.2778e-02],
          [ 4.4634e-02, -7.0455e-02, -2.1533e-02,  3.6857e-02,  5.7196e-02],
          [ 3.5345e-02,  6.9631e-02,  5.5229e-02,  4.2128e-02, -3.4088e-02]],

         [[-7.2163e-03,  3.5563e-02, -3.1936e-02,  1.2877e-02, -2.3022e-02],
          [-5.9097e-02,  1.5192e-02,  6.8320e-02, -4.6643e-02, -1.9811e-03],
          [-5.9560e-03,  7.7662e-02,  7.9657e-02, -4.4968e-02, -7.6457e-02],
          [-1.1028e-02, -3.5175e-02, -1.3390e-02,  5.1161e-02, -5.4926e-02],
          [-4.1221e-02,  6.2617e-02, -5.9798e-02, -8.9769e-04,  6.3048e-02]],

         [[ 4.4751e-02, -5.1591e-02, -3.6866e-02, -7.4997e-02,  4.9472e-02],
          [-6.2221e-02,  6.6295e-02, -5.0621e-02,  3.4758e-02, -2.1337e-02],
          [ 5.0706e-02, -5.2147e-02,  6.8346e-02,  3.2746e-02,  7.1333e-02],
          [-1.4602e-02, -6.2453e-02, -1.9406e-02,  6.9041e-02, -2.8379e-02],
          [ 3.4721e-03,  7.7462e-02,  5.1289e-02,  3.3926e-02, -2.8289e-02]],

         [[-5.0928e-02, -2.8340e-02,  2.2817e-02,  7.0458e-02,  4.2438e-02],
          [-3.3680e-02,  4.5647e-02, -4.9270e-02, -2.8433e-02,  2.9541e-02],
          [ 2.3391e-02, -7.4426e-02,  6.4900e-02, -7.1001e-02,  4.5884e-02],
          [-5.7277e-02, -4.6285e-03,  5.3839e-02, -5.6219e-02,  7.9665e-02],
          [-8.1319e-02, -8.1240e-02,  1.2502e-02,  5.5328e-02,  2.0030e-02]]],

        [[[ 4.8441e-02,  4.8989e-02, -2.4322e-02, -5.9358e-02, -6.8357e-02],
          [-2.1089e-02,  4.5388e-02, -5.1134e-03,  4.3126e-02, -1.6892e-03],
          [-8.1167e-02, -5.7231e-02,  3.9827e-02, -6.3646e-03, -3.9094e-02],
          [-4.2316e-02,  5.3464e-02, -8.8957e-03, -4.7075e-02, -1.7774e-02],
          [-6.5931e-02,  1.1694e-02, -3.9309e-02,  4.1809e-02,  1.2976e-03]],

         [[-6.8713e-02, -1.7812e-02, -1.4486e-02,  8.3056e-03, -6.8082e-02],
          [-1.8569e-02,  2.6123e-02,  2.4870e-05,  5.6066e-02,  4.5435e-02],
          [ 5.3947e-02,  1.4763e-02,  1.4906e-02, -5.5155e-02,  4.7769e-02],
          [ 5.3297e-02, -3.9101e-02,  4.6465e-02,  1.1151e-02,  5.5033e-02],
          [ 3.6748e-03,  3.9723e-02, -4.1154e-02,  4.6245e-02, -3.3245e-02]],

         [[-1.8014e-02, -8.0139e-02, -6.3273e-03, -3.8566e-02,  4.0923e-03],
          [ 5.8483e-02, -6.5439e-02,  2.0173e-02,  7.2705e-02, -3.5129e-02],
          [-4.8423e-02, -5.2663e-02, -1.3957e-02,  2.9158e-02,  7.8463e-02],
          [ 7.1398e-04,  6.5734e-02,  3.6854e-02, -2.1278e-02,  3.3324e-02],
          [ 3.6285e-02,  4.2179e-02, -2.9803e-02, -4.1720e-03, -2.0233e-02]],

         [[-2.7885e-02,  5.2520e-02,  5.1337e-02, -2.6349e-02,  1.5047e-02],
          [ 8.1576e-02, -3.3374e-02,  2.2938e-02, -4.8218e-02,  3.5318e-02],
          [-6.8747e-02, -4.1312e-02,  7.9037e-03,  6.8197e-02, -5.2138e-02],
          [ 2.2267e-03,  2.0724e-02, -5.1848e-02,  4.9394e-02,  7.5763e-02],
          [ 3.7441e-02,  2.7114e-02,  2.8150e-02, -5.4438e-02, -5.9701e-02]],

         [[ 7.5385e-02,  2.2393e-02, -6.8777e-02, -2.0514e-02,  6.8338e-02],
          [-6.6630e-02,  7.4462e-02,  4.0799e-02, -1.2080e-02,  7.0668e-02],
          [-7.2318e-03, -4.5113e-02, -8.0733e-03,  6.0332e-02,  7.7593e-02],
          [ 1.5821e-02, -2.2018e-02,  7.7987e-02,  4.0169e-02, -6.3680e-02],
          [-8.1590e-02,  6.3925e-02,  6.4027e-02,  3.3750e-02, -6.6571e-02]],

         [[-4.0221e-02,  2.1132e-02, -7.6372e-02,  3.1901e-02, -2.6639e-03],
          [ 6.2732e-02,  2.5971e-02, -3.8870e-02, -5.9896e-02,  3.6378e-02],
          [ 1.7306e-02,  4.8518e-02, -3.4341e-02, -5.0257e-03, -7.7493e-02],
          [-5.3312e-02,  4.5801e-02,  6.3797e-02, -7.6749e-02, -1.9681e-02],
          [ 4.9668e-02,  2.5340e-02, -4.5465e-02,  5.5866e-02, -1.7633e-03]]],

        [[[ 3.3887e-02,  4.1100e-02, -6.0800e-02,  3.0658e-02, -2.4294e-02],
          [ 2.2048e-02,  3.4223e-02, -2.1182e-02, -1.4165e-02,  2.0991e-02],
          [-3.8944e-03,  2.8239e-02, -3.5322e-02, -2.2963e-02,  2.2183e-02],
          [-4.6182e-02,  7.5100e-02, -5.7425e-02, -7.0277e-02,  7.4489e-02],
          [-6.5370e-02,  6.8510e-02,  2.4596e-02,  7.9349e-03,  4.1632e-02]],

         [[-8.0785e-02, -5.5982e-02, -5.2925e-02,  4.4351e-02, -5.4612e-02],
          [-2.7587e-02, -4.9968e-02,  4.6770e-02, -4.7240e-02,  5.7632e-02],
          [ 2.1552e-02,  1.9329e-03, -3.6635e-02, -4.1714e-04,  7.6460e-02],
          [ 8.0785e-02,  5.0883e-02,  7.0737e-02,  4.5160e-02,  1.2882e-03],
          [ 7.1417e-02, -2.8139e-02,  6.3305e-02, -2.5239e-02,  7.1895e-02]],

         [[-7.3631e-02, -3.3411e-02,  3.2707e-02, -6.8281e-02,  2.5994e-02],
          [ 5.2490e-02, -5.4156e-02, -8.1550e-02, -2.4794e-02, -6.3099e-02],
          [ 4.9664e-02,  1.6858e-02, -4.8651e-02,  1.4407e-02, -7.8078e-02],
          [ 8.1470e-03,  2.8146e-03,  5.2201e-02, -2.3638e-02, -3.0703e-02],
          [ 7.4847e-02, -2.3422e-02,  2.0211e-02, -3.9981e-02,  4.2893e-02]],

         [[-3.6386e-02,  6.5790e-02,  4.4377e-02, -3.0673e-02,  6.3339e-02],
          [-3.0514e-03,  6.0295e-02,  2.9729e-02, -4.2983e-02,  3.9292e-02],
          [ 3.1405e-02, -7.3552e-02,  3.4892e-03, -2.5254e-02,  8.5877e-03],
          [ 7.1609e-02, -5.1557e-02, -4.3620e-03, -1.5285e-02,  1.2429e-03],
          [ 6.9231e-02,  3.3598e-02,  2.8721e-02, -2.2294e-02, -7.9912e-02]],

         [[-7.5321e-03,  4.9022e-02, -8.0334e-02,  6.9277e-03, -7.5131e-02],
          [ 3.3288e-02,  7.3412e-03, -4.8152e-02, -7.9854e-02,  7.6142e-02],
          [-1.4706e-02,  5.5508e-02,  3.6131e-02,  7.8418e-02,  7.6000e-02],
          [ 2.8675e-02,  4.3452e-02, -5.0536e-02, -8.1299e-02, -4.2749e-02],
          [ 1.1211e-02, -5.1659e-02,  3.7996e-02, -1.3355e-02,  8.0545e-02]],

         [[-5.1144e-02, -2.0223e-02,  6.8809e-02, -5.1931e-02, -4.6908e-02],
          [ 7.0982e-02, -7.3319e-02, -7.8952e-02, -8.1581e-02, -5.7866e-02],
          [ 1.6373e-03, -6.2698e-02,  1.2726e-02,  5.8875e-02,  5.3699e-02],
          [-9.7808e-03, -2.1850e-02,  9.0736e-03, -5.3497e-02,  1.2925e-02],
          [ 2.4039e-02,  7.6288e-02, -2.5936e-03,  5.2908e-02, -6.6839e-02]]]],

torch.Size([12, 6, 5, 5])

It's obviously that the weight of a convolutional layer is a rank-4 tensor, and the shape of the tensor is [out_channels, in_channels, kernel_size, kernel_size].

其实weight tensor中,这些大小为5×5的weight就是这个kernel里面的内容,即filter。这是我们第一次如此直观地看这个5×5的玩意,所以为了更好地理解,需要好好学一学什么是卷积了。可以按顺序看一下这几个视频:



实际不然。在上述三个视频中,其动画演示均为二维图像、二维滤波器。然而实际上,这个滤波器可以是高维的。还是以刚刚的两层卷积为例:conv1时,一个5×5的滤波器滑过原始图形进行卷积,然后得到6张feature map;conv2时,这6张feature map像叠豆腐皮一样叠在一起,从俯视图来看,不论是滤波器还是feature map都还是二维的,然而要是从侧面看,那这6张feature map还有"高度",这个滤波器也有"高度",6张feature map的"高度"和这个俯视图大小为5×5的滤波器的"高度"一样,都是6。



As for linear layers:

python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t




Linear(in_features=192, out_features=120, bias=True)

Parameter containing:
tensor([[-0.0705,  0.0281, -0.0345,  ...,  0.0707,  0.0039,  0.0049],
        [ 0.0185, -0.0139, -0.0371,  ..., -0.0181,  0.0251, -0.0153],
        [-0.0564, -0.0413,  0.0650,  ..., -0.0377,  0.0203,  0.0308],
        [-0.0158, -0.0176, -0.0564,  ...,  0.0637,  0.0659, -0.0601],
        [ 0.0695, -0.0088,  0.0561,  ..., -0.0145,  0.0285, -0.0415],
        [ 0.0472,  0.0183,  0.0253,  ...,  0.0290,  0.0631,  0.0598]],

torch.Size([120, 192])

If we make some changes to the code:

python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t




Linear(in_features=120, out_features=60, bias=True)

Parameter containing:
tensor([[-0.0125, -0.0863,  0.0814,  ...,  0.0540, -0.0659, -0.0135],
        [ 0.0179, -0.0563, -0.0102,  ..., -0.0826, -0.0142,  0.0196],
        [ 0.0842, -0.0230,  0.0686,  ..., -0.0196,  0.0216, -0.0474],
        [-0.0725,  0.0761, -0.0251,  ..., -0.0715,  0.0418,  0.0547],
        [ 0.0271,  0.0791,  0.0274,  ..., -0.0206,  0.0681, -0.0490],
        [ 0.0534,  0.0316,  0.0514,  ..., -0.0452,  0.0563,  0.0776]],

torch.Size([60, 120])

If we make some changes to the code again:

python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t




Linear(in_features=60, out_features=10, bias=True)

Parameter containing:
tensor([[-9.5431e-02, -1.7005e-02,  7.1840e-02, -9.8549e-02, -7.6483e-02,
         -1.1399e-01, -7.4596e-02,  2.7645e-02,  4.7635e-02, -7.7890e-03,
         -1.2679e-01,  1.0586e-01, -3.0603e-02, -5.9335e-02,  2.3362e-02,
          1.2566e-01,  1.5932e-02,  6.4170e-02,  1.1894e-01,  1.1183e-02,
          5.5053e-02,  9.9240e-02,  9.1880e-02,  6.6743e-02, -1.9384e-02,
         -1.9155e-02,  4.9925e-02, -1.2826e-01,  4.2235e-02, -4.1539e-02,
          4.3719e-02,  1.1860e-02, -4.2918e-02,  6.8773e-02,  6.1953e-02,
          8.6118e-02,  7.6434e-02,  1.1209e-01, -2.3595e-02, -4.3001e-02,
          6.5224e-02,  1.2810e-01, -2.2255e-02, -1.4613e-02,  3.5808e-02,
          7.5827e-02,  9.4326e-02, -3.0887e-02, -4.9017e-02,  1.8240e-02,
          4.2232e-02, -1.0665e-01, -4.2260e-02,  4.7343e-02, -2.9680e-02,
          9.0191e-02,  2.2731e-02, -9.6192e-02, -6.2657e-02,  1.0931e-01],
        [ 1.0550e-01,  9.0423e-02, -9.3332e-02,  1.2436e-01,  3.2053e-02,
          8.7275e-02,  3.5335e-02,  9.3939e-02,  1.5199e-02,  5.1158e-02,
         -5.0064e-02,  9.0380e-02, -1.1103e-01,  2.5579e-02,  6.2949e-02,
          5.0965e-02,  1.8335e-02, -9.3640e-02,  4.6990e-02, -7.4176e-02,
         -1.0114e-01,  4.4343e-02, -1.1196e-01, -1.0231e-03, -9.9206e-03,
          6.2436e-02,  5.1555e-02,  1.0264e-01, -4.7560e-02,  1.1569e-01,
          5.7760e-02, -5.1481e-02,  3.1187e-02,  1.0843e-02,  1.6946e-03,
         -9.8104e-02,  1.2487e-01,  4.9549e-02, -6.4053e-02,  4.9612e-02,
          9.1507e-02,  9.7179e-02, -9.7871e-02,  6.7297e-02, -4.9381e-02,
         -3.8144e-02,  1.0243e-01,  3.2595e-02, -3.9338e-02,  1.1845e-01,
         -8.1794e-02,  9.7819e-02, -5.1667e-02, -1.1482e-01,  4.4470e-02,
         -2.8988e-03,  7.9396e-02, -7.8607e-02,  1.0526e-01, -4.2752e-02],
        [ 1.0955e-01,  4.8004e-02,  3.2397e-02,  3.4339e-02, -1.1132e-01,
          1.0589e-01,  8.7578e-02, -7.5346e-02, -2.3077e-02, -2.7315e-02,
          7.1867e-02, -8.9190e-02, -6.9571e-02, -2.3352e-04,  5.8229e-02,
         -2.1809e-03, -4.7928e-02,  9.5225e-02,  4.2706e-02,  1.0974e-01,
          1.1307e-01,  1.0599e-01, -2.3031e-02,  1.1409e-01, -4.7209e-02,
         -5.9337e-02, -9.2169e-02, -7.9493e-02, -6.5236e-02,  3.4885e-02,
          1.0364e-01,  1.1474e-01,  5.3987e-02, -9.2154e-02, -3.3679e-02,
          1.1838e-02,  1.4456e-02, -9.1507e-02, -1.0978e-02,  7.8048e-02,
         -1.7644e-02,  9.0853e-02,  5.6694e-02, -2.5896e-03, -6.1217e-02,
         -1.0350e-01,  4.4196e-02, -8.6999e-02,  9.8554e-02,  4.5385e-02,
         -1.2163e-01, -1.2214e-01,  1.1512e-01,  6.2625e-03,  1.2238e-02,
         -1.2423e-01,  1.1331e-01, -1.2531e-01,  8.2710e-02, -4.4940e-02],
        [-3.0085e-02,  3.8706e-02,  8.4378e-02, -5.4317e-02,  3.9812e-03,
         -1.0402e-01,  1.0845e-01, -8.3468e-02, -7.2716e-02, -6.2164e-02,
         -6.5018e-02,  1.0793e-01, -3.2918e-02,  5.4166e-05, -7.5525e-02,
          5.7055e-02, -7.7655e-02, -8.9118e-02,  9.4000e-02, -1.0995e-01,
         -1.2352e-01, -5.9220e-02, -8.2974e-02, -5.7302e-02,  1.5609e-02,
         -1.2284e-01,  6.3260e-03, -7.5324e-02,  8.5237e-02,  1.1582e-01,
          2.7184e-03,  9.7957e-02,  9.8548e-02, -3.8990e-02, -1.1632e-01,
         -4.2078e-02, -1.2860e-01,  3.8189e-02, -4.3375e-02,  8.3966e-03,
          7.8744e-02,  6.0847e-02,  3.0520e-02, -1.0597e-01, -4.5798e-03,
         -2.7219e-02,  9.6288e-02,  5.9863e-02,  4.3751e-02, -8.4328e-02,
         -6.4868e-02,  2.6567e-02, -1.5164e-02,  1.0628e-01,  2.0222e-02,
          7.9692e-02,  1.2675e-01, -6.1683e-02,  8.6366e-02, -9.7108e-02],
        [ 5.2049e-02,  5.8477e-02, -2.1854e-02,  1.1417e-01,  3.6167e-02,
          8.5150e-02,  7.1177e-02, -1.0457e-01, -4.9616e-02,  1.1462e-01,
          1.2540e-01, -1.1023e-01, -1.4444e-02,  8.2458e-02,  4.3347e-02,
          2.1713e-02,  4.5096e-02, -8.6695e-02, -1.1800e-01, -3.4499e-02,
         -3.3037e-02,  6.5183e-02, -5.5824e-02, -1.7711e-02, -3.2948e-02,
          3.7119e-02,  3.9931e-02,  8.4765e-02, -1.1503e-02, -1.0770e-01,
          9.0663e-02,  2.4122e-04,  3.0893e-02, -9.9923e-02,  7.0852e-02,
         -1.0870e-01,  1.9549e-02,  1.0645e-02,  4.2231e-02, -1.3988e-02,
          1.0743e-01, -4.0840e-02,  3.7030e-02, -1.9841e-02,  1.2380e-01,
          1.0932e-01,  9.7837e-02, -4.2590e-02,  3.5842e-02, -5.3193e-02,
         -1.1814e-01,  1.0496e-01,  9.8266e-02, -5.5253e-02, -7.5407e-02,
         -1.2650e-01, -1.0021e-01, -4.8345e-02, -7.2046e-02, -1.1478e-01],
        [ 1.0777e-01,  5.4178e-02,  3.9708e-02,  2.1861e-02,  1.5687e-02,
          5.2035e-02,  7.5276e-02, -1.0342e-01, -1.2357e-01,  9.8690e-03,
          6.1068e-02,  5.2973e-02, -4.8771e-02,  4.1235e-02,  1.0646e-01,
          5.7729e-02, -1.2204e-01, -5.1925e-02, -1.0611e-01,  5.0993e-02,
          1.2388e-01, -2.8327e-02,  1.1061e-01,  9.8941e-02, -8.1721e-02,
         -1.2586e-01, -6.6573e-02,  1.2495e-01, -9.1882e-02,  1.4345e-02,
         -3.7698e-02,  9.5678e-02, -8.9728e-02, -1.2532e-01, -6.8664e-02,
         -1.0077e-01, -1.2485e-01, -1.0039e-01,  1.7366e-02, -9.2025e-02,
         -1.2193e-01,  8.4152e-02,  1.1543e-01, -1.5280e-04, -5.1611e-03,
          9.3243e-02, -6.9181e-02,  1.1578e-01,  1.2789e-01,  8.4413e-02,
         -1.2359e-01,  1.2410e-01,  5.4242e-02, -4.2676e-02,  8.6314e-02,
         -1.2145e-01, -4.4658e-02,  1.0574e-01, -1.1472e-01,  6.9706e-02],
        [ 6.5161e-03,  5.0922e-02, -5.6269e-02, -3.0558e-02,  3.1047e-02,
          9.0965e-02, -1.1643e-01, -8.0907e-02, -1.1244e-02, -8.4535e-02,
          2.2675e-02,  5.3199e-02, -2.4098e-02, -8.7604e-02,  2.2962e-03,
          1.6083e-02, -2.4835e-02, -8.4841e-04,  6.9693e-02,  7.1168e-02,
         -6.2643e-02,  3.5149e-02,  5.4904e-02, -1.1310e-01,  9.0528e-02,
         -1.0193e-01, -1.1822e-01,  6.7618e-03,  1.0250e-01,  1.2602e-01,
          1.1590e-01,  9.0234e-03,  8.2840e-03,  8.0452e-03, -9.4796e-02,
         -1.2143e-01, -4.2476e-02,  6.2538e-02, -1.1794e-01, -7.1803e-02,
          1.2123e-01,  4.5182e-02,  3.5699e-02, -3.9896e-02, -2.9128e-02,
          1.0832e-01,  5.0583e-02,  1.0019e-01, -1.2803e-01,  3.9198e-02,
          6.2140e-02,  5.6023e-02, -5.2505e-02, -1.1529e-01,  5.9731e-02,
         -6.7360e-02,  6.9474e-02,  5.3126e-03, -2.2815e-02,  5.5029e-03],
        [-9.1546e-02, -1.4077e-02,  1.2508e-01, -5.5350e-02, -1.2050e-01,
          4.3662e-02,  1.1357e-01, -7.5394e-04,  9.5602e-02, -1.5148e-02,
         -2.3044e-02, -9.9336e-02, -5.4554e-02, -8.3432e-02,  3.3011e-02,
         -8.7373e-02,  1.0024e-01,  7.9814e-02,  6.3561e-02, -4.5564e-02,
         -2.7423e-02,  6.9591e-02, -7.4324e-02, -4.9883e-02,  7.2256e-02,
          8.3420e-02,  1.0621e-01,  5.4662e-02,  1.9535e-02,  8.5969e-02,
          1.2477e-01, -1.8929e-02,  3.0651e-02, -1.0773e-01,  4.5250e-02,
          9.2125e-03, -7.5953e-02, -9.0117e-02,  3.8775e-02,  1.0886e-01,
          7.4212e-02, -2.9865e-02, -8.0774e-02, -7.5558e-02,  1.0732e-02,
          1.1173e-01, -1.0181e-01,  1.0239e-01, -9.2329e-02,  1.0044e-01,
         -7.8311e-02, -9.0443e-02, -8.6252e-02, -9.1338e-02,  8.5589e-02,
         -7.1082e-02, -1.2312e-02, -1.2249e-01,  4.2000e-02, -7.2630e-02],
        [-8.7615e-02, -5.4850e-02, -9.0432e-02, -7.0952e-02, -1.2291e-01,
         -2.9952e-02, -5.4879e-02, -2.6346e-03, -5.7508e-02,  2.3920e-02,
          2.0443e-02,  4.9481e-02, -5.7710e-02, -7.6047e-03,  1.0627e-01,
          5.6075e-02, -4.2638e-03, -9.4184e-02, -3.2324e-02, -3.0417e-02,
         -1.1748e-01,  1.0028e-01,  2.6063e-02,  1.2815e-01, -1.5617e-02,
         -1.0777e-02, -5.2505e-02,  9.2385e-02,  6.3559e-02, -5.8406e-02,
         -9.3385e-02,  1.1813e-01,  7.4189e-02, -5.1628e-03,  3.5948e-03,
          5.9901e-02,  6.0983e-02,  1.3936e-02, -1.3937e-02, -3.5012e-02,
          6.0801e-02,  2.2357e-02, -6.0733e-02, -9.1602e-03, -4.8544e-02,
          5.8872e-02, -3.8335e-02, -1.0714e-01,  4.0171e-02,  3.4495e-02,
         -1.1599e-01,  9.9737e-02, -7.1050e-02, -3.1361e-03, -1.0286e-01,
         -6.1321e-02,  1.0879e-01,  2.0422e-02, -7.8263e-02,  1.7409e-02],
        [-1.0652e-01, -1.1840e-01, -1.0681e-01, -8.5379e-02,  4.0005e-02,
          1.2704e-02, -6.7726e-02, -4.8161e-02, -1.0163e-01, -1.1618e-01,
          6.5847e-02, -9.0188e-02,  4.1038e-02,  8.8101e-02, -1.0805e-01,
         -2.1096e-02, -1.1908e-01,  9.4948e-02, -1.4950e-02,  6.2474e-02,
         -1.0710e-01,  1.1839e-01,  4.3317e-02,  3.4554e-02,  7.7462e-02,
         -9.7047e-03,  8.8060e-02, -9.6333e-02, -1.1456e-01, -1.8947e-02,
         -6.0117e-02,  4.4904e-02,  1.3118e-02, -1.2114e-01, -7.4878e-03,
          1.8010e-02,  1.2726e-01,  6.4233e-02, -8.0996e-02, -2.9081e-02,
          3.2686e-02, -7.4530e-02,  4.3252e-02, -3.4391e-02,  3.8930e-03,
          1.2132e-01,  7.9597e-02,  1.1899e-01,  3.5353e-02,  1.2190e-01,
         -1.0795e-01,  5.3253e-03, -4.8033e-02,  1.0856e-01, -4.0871e-02,
         -6.6763e-02, -1.0683e-01, -6.1443e-02, -1.0282e-01, -3.0951e-02]],

torch.Size([10, 60])

If the previous result was hard to reflect the specific shape of the tensor, then this result is much clearer. It's obviously that the weight of a linear layer is a rank-2 tensor, and the shape of the tensor is [out_features, in_features].


python 复制代码
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

fc2输出的是一个大小为60×1的张量,在linear layer中,所进行的操作就是weight tensor和上一层的输出做点乘(矩阵乘)。放到这里,out这一层的weight tensor形状为10×60,上一层fc2输出的张量形状为60×1,一个10×60矩阵点乘60×1的矩阵,结果为10×1的矩阵。而我们最终需要的也就是这个10×1的矩阵。


python 复制代码
for param in network.parameters():

for name, param in network.named_parameters():
    print(name, '\t\t', param.shape)
torch.Size([6, 1, 5, 5])
torch.Size([12, 6, 5, 5])
torch.Size([120, 192])
torch.Size([60, 120])
torch.Size([10, 60])

conv1.weight 		 torch.Size([6, 1, 5, 5])
conv1.bias 		 torch.Size([6])
conv2.weight 		 torch.Size([12, 6, 5, 5])
conv2.bias 		 torch.Size([12])
fc1.weight 		 torch.Size([120, 192])
fc1.bias 		 torch.Size([120])
fc2.weight 		 torch.Size([60, 120])
fc2.bias 		 torch.Size([60])
out.weight 		 torch.Size([10, 60])
out.bias 		 torch.Size([10])
2.4.Callable Neural Networks - Linear Layers in Depth


In the last post, we learned about how the linear layer uses matrix multiplication to transform input features into output features. And in this post, the author mainly demonstrates how a single linear layer performs forward transfer in PyTorch.

Let's start with linear algebra first.

python 复制代码
import torch

in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)

weight_matrix = torch.tensor([
    [1, 2, 3, 4],
    [2, 3, 4, 5],
    [3, 4, 5, 6]
], dtype=torch.float32)


This performs a simple matrix multiplication, and the result is as follows:

tensor([30., 40., 50.])

What will happen if we use a linear layer?

python 复制代码
in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
weight_matrix = torch.tensor([
    [1, 2, 3, 4],
    [2, 3, 4, 5],
    [3, 4, 5, 6]
], dtype=torch.float32)

fc = nn.Linear(in_features=4, out_features=3)


tensor([30., 40., 50.])
tensor([-0.0445,  2.4113, -2.7080], grad_fn=<ViewBackward0>)

It is obviously that the result is different from the earlier one. The reason is that the weigh_matrix in the linear layer is a random tensor, so we need to specify its value:

python 复制代码
in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
weight_matrix = torch.tensor([
    [1, 2, 3, 4],
    [2, 3, 4, 5],
    [3, 4, 5, 6]
], dtype=torch.float32)

fc = nn.Linear(in_features=4, out_features=3)
fc.weight = nn.Parameter(weight_matrix)


tensor([30., 40., 50.])
tensor([29.8636, 39.6883, 50.2340], grad_fn=<ViewBackward0>)

And the result is still different from the earlier one. That is because the bias is enabled by default. So, we disable the bias:

python 复制代码
in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
weight_matrix = torch.tensor([
    [1, 2, 3, 4],
    [2, 3, 4, 5],
    [3, 4, 5, 6]
], dtype=torch.float32)

fc = nn.Linear(in_features=4, out_features=3, bias=False)
fc.weight = nn.Parameter(weight_matrix)


tensor([30., 40., 50.])
tensor([30., 40., 50.], grad_fn=<SqueezeBackward4>)

Mathematical Notation for the Linear Transformation: y = A x + b y=Ax+b y=Ax+b.

Variable Definition
A A A Weight matrix tensor
x x x Input tensor
b b b Bias tensor
y y y Output tensor

If the parameter bias is set to false, the linear transformation becomes y = A x y=Ax y=Ax.

剩余部分直接看视频吧:20-Callable Neural Networks - Linear Layers in Depth_哔哩哔哩_bilibili,从06:37开始


2.5.How to Debug PyTorch Source Code - Debugging Setup

讲的是在VS Code中调试。太细了,直接看视频吧:21-How to Debug PyTorch Source Code - Deep Learning in Python_哔哩哔哩_bilibili


2.6.CNN Forward Method - Deep Learning Implementation

Recall the network we built earlier, which consists of 5 layers:

python 复制代码
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t

In fact, the neural network has an input layer, so if we count the input layer, the network we built actually has 6 layers. However, the input layer simply places the data there without any transformation, or it is an identity transformation: f ( x ) = x f(x)=x f(x)=x. Therefore, when constructing neural networks, the input layer is usually omitted(省去的).

But since we are beginners, for the sake(目的) of the completeness of the network structure, let's include this layer:

python 复制代码
    def forward(self, t):
        # (1) input layer
        t = t
        return t

So, in our input layer, we have t coming in and t coming out, there is no change here for our tensor t.

(1) convolutional layers

Next, we continue building the forward propagation(传播) for two convolutional layers. Additionally, after the convolution calculation in each layer, we add an activation function(ReLU) and pooling(max pooling):

python 复制代码
    def forward(self, t):
        # (1) input layer
        t = t

        # (2) hidden conv layer
        t = self.conv1(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (3) hidden conv layer
        t = self.conv2(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

    return t

Before we dive into learning this code, we need to first review some background knowledge on activation functions and pooling.

  • activation function

    Blog: 激活函数(Activation Function)-CSDN博客

    We don't need to fully master all activation functions at this point, but we do need to have a basic understanding of them.

  • pooling

    There are many types of pooling, but here we will introduce just two: average pooling and max pooling.

    • average pooling

      e.g. If kernel_size=2, stride=2:

    • max pooling

      Replace AVERAGE with MAX in the image above.

Now back to the forward propagation. Both the ReLU operation and the max pooling operation are pure operations. Neither of these have weights.

Sometimes we may hear pooling operations refer to as pooling layers, sometimes we may hear activation operations called activation layers. However, what makes a layer distinct from an operation is that layers have weights. Since pooling operations and activation operations do not have weights, we treat them as simple "operations."

(2) linear layers

Before we pass the input features into the first hidden linear layer, we must reshape our tensor to flatten it. Since the 4th layer is the 1st linear layer, we will include our shaping operation at 4th layer.

python 复制代码
    def forward(self, t):
        # (1) input layer
        t = t

        # (2) hidden conv layer
        t = self.conv1(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (3) hidden conv layer
        t = self.conv2(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (4) hidden linear layer
        t = t.reshape(-1, 12 * 4 * 4)
        t = self.fc1(t)
        t = F.relu(t)

        # (5) hidden linear layer
        t = self.fc2(t)
        t = F.relu(t)

    return t

Now we can address an issue that has existed since we started building the network. Why the in_features of fc1 is 12*4*4? And why the parameter of reshape is 12*4*4 as well?

python 复制代码
class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

This is because the input image is a 28x28 grayscale image. After the first convolution operation, the output feature maps are 24x24 in size, as the convolution kernel size is 5 and the stride is 1(If you can't get this point, you can try drawing a diagram yourself and manually simulate the convolution process). The activation function does not affect the size of the feature maps. Then, a pooling operation is applied, with a kernel size of 2 and a stride of 2, resulting in 12x12 feature maps. After another convolution operation, the size becomes 8x8. After another pooling operation, the size becomes 4x4. Finally, we have 12 feature maps as the output, so the total size is 12x4x4.

The final layer in our network is a linear layer that we called the output layer. When we pass our tensor to the output layer, the result will be a prediction tensor. Since our data has 10 prediction classes, we know our output tensor has 10 elements.

python 复制代码
    def forward(self, t):
        # (1) input layer
        t = t

        # (2) hidden conv layer
        t = self.conv1(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (3) hidden conv layer
        t = self.conv2(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (4) hidden linear layer
        t = t.reshape(-1, 12 * 4 * 4)
        t = self.fc1(t)
        t = F.relu(t)

        # (5) hidden linear layer
        t = self.fc2(t)
        t = F.relu(t)

        # (6) output layer
        t = self.out(t)
        # t = F.softmax(t, dim=1)

    return t

关于最后一行代码,有兴趣的可以自己去看一下视频:22-CNN Forward Method - PyTorch Deep Learning Implementation_哔哩哔哩_bilibili,从08:50开始。


2.7.Forward Propagation Explained - Pass Image to PyTorch Neural Network


python 复制代码
import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms


train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        t = t.reshape(-1, 12 * 4 * 4)
        t = F.relu(self.fc1(t))

        t = F.relu(self.fc2(t))

        t = self.out(t)

        return t


network = Network()

sample = next(iter(train_set))
image, label = sample
# 取一张图

print(image.unsqueeze(0).shape)  # This gives us a batch with size 1
pred = network(image.unsqueeze(0))  # image shape needs to be (batch_size x in_channels x H x W)
# 传入网络的必须是一批图而非一个图,所以我们需要人为设定一个批次,即便这一个批次里面只有一张图

# 这个时候的pred已经是network的output了
# 这10个值就是网络预测的结果

# 这个lable是数据集中设定的lable,即这个图片应该是什么
# 这个结果是告诉你网络预测的结果是什么,判定的依据是10个值中哪个最大

print(F.softmax(pred, dim=1))
# 如果使用softmax,则会把刚刚10个值归一化
print(F.softmax(pred, dim=1).sum())
# 即所有概率的概率和为1



torch.Size([1, 28, 28])
torch.Size([1, 1, 28, 28])
torch.Size([1, 10])
tensor([[-0.0045,  0.0916, -0.1845, -0.0453,  0.1288,  0.0473,  0.0224,  0.0960,  0.0427,  0.0115]])
tensor([[0.0972, 0.1070, 0.0812, 0.0933, 0.1110, 0.1024, 0.0998, 0.1075, 0.1019, 0.0988]])


python 复制代码
sample = next(iter(train_set))
image, label = sample

net1 = Network()
net2 = Network()
tensor([[ 0.0903, -0.0831, -0.0396,  0.1106, -0.0635, -0.1026, -0.0343,  0.0900, -0.0440,  0.0216]])
tensor([[ 0.0634,  0.1485, -0.0197, -0.1658,  0.0848, -0.0521,  0.0334,  0.0521, -0.0347, -0.1294]])
2.8.Neural Network Batch Processing - Pass Image Batch to PyTorch CNN


python 复制代码
import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms


train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        t = t.reshape(-1, 12 * 4 * 4)
        t = F.relu(self.fc1(t))

        t = F.relu(self.fc2(t))

        t = self.out(t)

        return t


network = Network()

data_loader = torch.utils.data.DataLoader(

batch = next(iter(data_loader))
images, labels = batch

preds = network(images)

print(F.softmax(preds, dim=1))
print(F.softmax(preds, dim=1).sum())



python 复制代码
tensor([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
tensor([9, 0, 0, 3, 0, 2, 7, 2, 5, 5, 0, 9, 5, 5, 7, 9, 1, 0, 6, 4])
tensor([False, False, False,  True, False, False, False, False, False, False, False, False, False, False, False, False,
        False, False, False, False])



python 复制代码

def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()

print(get_num_correct(preds, labels))
tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
tensor([9, 0, 0, 3, 0, 2, 7, 2, 5, 5, 0, 9, 5, 5, 7, 9, 1, 0, 6, 4])
tensor([False, False, False, False, False,  True, False,  True, False, False, False, False, False, False, False, False,
        False, False, False, False])


2.9.CNN Output Size Formula - Bonus Neural Network Debugging Session

又是调试,让你更直观地理解张量变化的过程,直接看视频吧:25-CNN Output Size Formula - Bonus Neural Network Debugging Session_哔哩哔哩_bilibili

最后给了一个用于计算经过卷积或池化操作后的tensor shape的公式:

O = \\frac{n - f + 2p}{s} + 1 \\


参数 含义
O O O output的单张图片(二阶张量)的一条边的像素数
n n n input的单张图片(二阶张量)的一条边的像素数
f f f filter(不论卷积核、池化核)的单边像素数
p p p zero padding单边像素数
s s s stride的值



3.Section 3: Training Neural Networks

Bird's eye view of the process

From a high-level perspective or bird's eye view of our deep learning project, we prepared our data, built our model, and now, we are ready to train our model.

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model's results

The training process can be broken down into 7 distinct steps:

  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.

Therefore, a complete training process can be viewed as a two-tiered for-loop. The top one is used to iterate(遍历,迭代) over all epochs, and the other is used to iterate over each batch.

We now know exactly how to do the first two steps by studying earlier posts, and other steps are going to be covered in detail in the following sections: We use a loss function to perform step 3, and we use a back propagation to perform step 4, and we use a optimization algorithm(运算) to perform step 5. For a loss function, back propagation and optimization algorithm, this is where PyTorch does most of the heavy lifting for us.

3.1.CNN Training - Using a Single Batch


  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.


python 复制代码
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim  # short for optimizer. This can give us access to the optimizer we will use to update weights.

import torchvision
import torchvision.transforms as transforms


train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = t.reshape(-1, 12 * 4 * 4)
        t = F.relu(self.fc1(t))
        t = F.relu(self.fc2(t))
        t = self.out(t)
        return t

network = Network()

data_loader = torch.utils.data.DataLoader(train_set, batch_size=100)    # step 1: Get batch from the training set.
batch = next(iter(data_loader))
images, labels = batch

preds = network(images) # step 2: Pass batch to network.
loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

network = Network()之前,都是之前已经完成的内容,这里只是贴一下,重点是后面。我们设置这一批图像有100张,即batch_size=100,这个值并没有特殊要求,属于超参数(Hyperparameter)。




  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.


python 复制代码
preds = network(images) # step 2: Pass batch to network.
loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).





python 复制代码
preds = network(images) # step 2: Pass batch to network.
loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.


tensor([[[[-1.0623e-03, -1.0646e-03, -6.1956e-04, -9.1175e-04, -1.8425e-04],
          [-8.5111e-04, -8.6746e-04, -6.9800e-04, -9.1261e-04, -3.7601e-04],
          [-6.6714e-04, -7.8021e-04, -6.6524e-04, -6.7183e-04,  1.1104e-04],
          [-4.9818e-04, -4.7597e-04, -2.6226e-04,  6.5207e-06,  2.9077e-04],
          [-4.8935e-04, -6.3679e-04, -2.0987e-04, -1.8809e-04,  2.1360e-04]]],

        [[[-3.4436e-03, -3.4338e-03, -4.3864e-03, -4.5025e-03, -3.3317e-03],
          [-3.9869e-03, -3.8186e-03, -4.5539e-03, -4.4803e-03, -2.9815e-03],
          [-3.9687e-03, -3.7480e-03, -4.5039e-03, -4.6454e-03, -3.2644e-03],
          [-3.5405e-03, -2.9284e-03, -4.2258e-03, -4.6919e-03, -3.5234e-03],
          [-3.3612e-03, -2.9792e-03, -4.4591e-03, -4.5373e-03, -3.3610e-03]]],

        [[[ 3.2026e-04,  1.4045e-04,  1.4094e-04,  1.1505e-03,  1.4112e-03],
          [ 6.1431e-04,  6.2140e-04,  8.4316e-04,  1.5054e-03,  1.2524e-03],
          [ 5.7009e-04,  4.6536e-04,  4.9394e-04,  1.0977e-03,  1.1926e-03],
          [ 3.9472e-04,  5.1646e-04,  3.4546e-04,  1.0285e-03,  1.3299e-03],
          [ 3.4585e-04,  5.0746e-04,  4.9153e-04,  1.2860e-03,  1.2003e-03]]],

        [[[-4.6837e-05,  2.8100e-05,  2.5777e-05, -7.9798e-07, -2.3992e-06],
          [-3.5947e-05,  4.4032e-05,  7.3051e-05, -1.0639e-05,  1.7310e-06],
          [-1.2074e-04,  1.6598e-05,  3.8755e-05, -2.4356e-06,  2.5773e-07],
          [-1.5912e-04,  2.3555e-06,  3.8257e-05, -1.3147e-06,  3.6569e-06],
          [-1.8059e-04, -2.4576e-05,  5.3526e-06,  1.0901e-05, -1.4959e-05]]],

        [[[ 4.7523e-03,  4.1332e-03,  4.8969e-03,  5.3612e-03,  4.8580e-03],
          [ 4.3459e-03,  4.5477e-03,  4.8959e-03,  5.3994e-03,  4.8403e-03],
          [ 4.2582e-03,  4.1166e-03,  4.2809e-03,  5.3493e-03,  4.9570e-03],
          [ 4.6786e-03,  3.8365e-03,  4.3224e-03,  4.8648e-03,  5.2457e-03],
          [ 4.5927e-03,  4.0781e-03,  4.5919e-03,  5.1500e-03,  5.4433e-03]]],

        [[[ 1.3431e-04,  2.8257e-04,  1.6781e-04,  4.8167e-06,  3.8115e-05],
          [ 9.9751e-05,  3.7092e-04,  2.0003e-04, -4.4673e-06,  6.9875e-06],
          [ 2.2270e-04,  3.9302e-04,  2.4296e-04, -8.1596e-06, -1.5551e-06],
          [ 2.3624e-04,  2.9259e-04,  3.1815e-04,  1.8888e-05, -1.6435e-05],
          [ 3.2756e-04,  3.6447e-04,  3.9779e-04,  3.3469e-05,  3.9165e-06]]]])
torch.Size([6, 1, 5, 5])
torch.Size([6, 1, 5, 5])

在反向传播之后就可以计算梯度了,而且可以发现梯度的形状和权重的形状是一样的,即weight tensor中的每一个weight parameter都有一个自己对应的gradient。


  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.


python 复制代码
preds = network(images) # step 2: Pass batch to network.
loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

# print(network.conv1.weight.grad)
loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
# print(network.conv1.weight.grad)
# print(network.conv1.weight.grad.shape)
# print(network.conv1.weight.shape)

optimizer = optim.Adam(network.parameters(), lr=0.01)
print(get_num_correct(preds, labels))
optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.
preds = network(images)
loss = F.cross_entropy(preds, labels)
print(get_num_correct(preds, labels))

需要注意三个点。其一,我们设置optimizer的学习率为0.01,即lr=0.01lr是learning rate的缩写),这个参数也是超参数,要根据效果手动调整。如果学习率过小,则可能陷入局部最小点;如果学习率过大,则可能无法找到最小点。关于学习率的内容,后续会专门出番外进行补充。其二,我们这里定义optimizer是用的Adam,还可以用SGD,这些也等后续再专门补充。其三,我们使用了一个get_num_correct函数,这个函数在2.8.小节中讲到过,这里再贴一下函数定义:

python 复制代码
def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()



可以看到,在经过step 5后,损失降低,预测正确的个数增多,说明优化是有用的。至此,前5步全部结束:

  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.

整理一下之前所做的工作:我们从6万个样本中抽取了100个样本作为一批次(1 batch),然后将这一批数据传输到网络上,然后先后计算损失、反向传播、计算梯度、更新权重。在更新权重之前,预测的正确率为9%;在更新一次权重之后,预测的正确率为15%。这便是处理一个batch的过程,一个epoch包括处理若干个batch,一次完整的训练包括处理若干个epoch。

3.2.CNN Training Loop - Using Multiple Epochs


  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.


python 复制代码
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim  # short for optimizer. This can give us access to the optimizer we will use to update weights.

import torchvision
import torchvision.transforms as transforms

def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()


train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = t.reshape(-1, 12 * 4 * 4)
        t = F.relu(self.fc1(t))
        t = F.relu(self.fc2(t))
        t = self.out(t)
        return t

network = Network()

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)    # step 1: Get batch from the training set.
optimizer = optim.Adam(network.parameters(), lr=0.01)

batch = next(iter(train_loader))
images, labels = batch

preds = network(images) # step 2: Pass batch to network.
loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.


python 复制代码
network = Network()

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)    # step 1: Get batch from the training set.
optimizer = optim.Adam(network.parameters(), lr=0.01)

for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
    images, labels = batch

    preds = network(images) # step 2: Pass batch to network.
    loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

    loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
    optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.





  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.

由于我们总共有6万张图片,每一个batch的size是100张,所以我们一共有600个batch。也就是说,我们要向损失函数的最小值走600步,每一步的大小为0.01(learning rate)。如果我们调大batch size,则num of batch就会减小,则我们向损失函数的最小值走的步数也会减小。


python 复制代码
for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
    images, labels = batch

    preds = network(images) # step 2: Pass batch to network.
    loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

    loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
    optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.

print(get_num_correct(preds, labels))




这一段建议看一看调试过程:27-CNN Training Loop Explained - Neural Network Code Project_哔哩哔哩_bilibili,从10:57到17:47,目的不是学习如何调试,是为了更直观地观察训练过程是如何工作的。


  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.


python 复制代码
for epoch in range(5):
    for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
        images, labels = batch

        preds = network(images) # step 2: Pass batch to network.
        loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

        loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
        optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.




python 复制代码
network = Network()

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)    # step 1: Get batch from the training set.
optimizer = optim.Adam(network.parameters(), lr=0.01)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

for epoch in range(5):
    for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
        images, labels = batch
        images, labels = images.to(device), labels.to(device)

        preds = network(images) # step 2: Pass batch to network.
        loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

        loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
        optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.

    print("epoch:", epoch, ", loss:", loss.item(), ", correct:", get_num_correct(preds, labels))


epoch: 0 , loss: 0.4874052107334137 , correct: 83
epoch: 1 , loss: 0.3818645179271698 , correct: 83
epoch: 2 , loss: 0.3500566780567169 , correct: 87
epoch: 3 , loss: 0.3230673670768738 , correct: 87
epoch: 4 , loss: 0.2824578881263733 , correct: 90


epoch: 0 , loss: 0.446563184261322 , correct: 86
epoch: 1 , loss: 0.3932000994682312 , correct: 86
epoch: 2 , loss: 0.3111892640590668 , correct: 92
epoch: 3 , loss: 0.393922358751297 , correct: 85
epoch: 4 , loss: 0.2904859781265259 , correct: 88


python 复制代码
for epoch in range(5):
    total_loss = 0
    total_correct = 0

    for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
        images, labels = batch
        images, labels = images.to(device), labels.to(device)

        preds = network(images) # step 2: Pass batch to network.
        loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

        loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
        optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.

        total_loss += loss.item()
        total_correct += get_num_correct(preds, labels)

    # print("epoch:", epoch, ", loss:", loss.item(), ", correct:", get_num_correct(preds, labels))
    print(f"epoch: {epoch}, average loss: {total_loss/len(train_loader):.10f}, average correct: {total_correct/len(train_loader):.10f}")


epoch: 0, average loss: 0.5749395862, average correct: 78.2483333333
epoch: 1, average loss: 0.3957502919, average correct: 85.2950000000
epoch: 2, average loss: 0.3672552117, average correct: 86.4583333333
epoch: 3, average loss: 0.3454391903, average correct: 87.1033333333
epoch: 4, average loss: 0.3291498821, average correct: 87.7916666667


epoch: 0, average loss: 0.5677242728, average correct: 78.6033333333
epoch: 1, average loss: 0.3984764252, average correct: 85.2316666667
epoch: 2, average loss: 0.3701192620, average correct: 86.2533333333
epoch: 3, average loss: 0.3539418967, average correct: 86.6983333333
epoch: 4, average loss: 0.3486876344, average correct: 87.1116666667
epoch: 5, average loss: 0.3347845890, average correct: 87.5100000000
epoch: 6, average loss: 0.3330437982, average correct: 87.6533333333
epoch: 7, average loss: 0.3283978304, average correct: 87.8016666667
epoch: 8, average loss: 0.3196323512, average correct: 88.2450000000
epoch: 9, average loss: 0.3210363051, average correct: 88.1016666667


python 复制代码
network = Network()

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)    # step 1: Get batch from the training set.
optimizer = optim.Adam(network.parameters(), lr=0.005)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

for epoch in range(20):
    total_loss = 0
    total_correct = 0

    for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
        images, labels = batch
        images, labels = images.to(device), labels.to(device)

        preds = network(images) # step 2: Pass batch to network.
        loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

        loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
        optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.

        total_loss += loss.item()
        total_correct += get_num_correct(preds, labels)

    # print("epoch:", epoch, ", loss:", loss.item(), ", correct:", get_num_correct(preds, labels))
    print(f"epoch: {epoch}, average loss: {total_loss/len(train_loader):.10f}, average correct: {total_correct/len(train_loader):.10f}")
epoch: 0, average loss: 0.5835410944, average correct: 78.2783333333
epoch: 1, average loss: 0.3760822483, average correct: 86.1400000000
epoch: 2, average loss: 0.3373622125, average correct: 87.4166666667
epoch: 3, average loss: 0.3163931920, average correct: 88.1750000000
epoch: 4, average loss: 0.2989334713, average correct: 88.6900000000
epoch: 5, average loss: 0.2870471949, average correct: 89.0916666667
epoch: 6, average loss: 0.2794247842, average correct: 89.4466666667
epoch: 7, average loss: 0.2709626081, average correct: 89.7316666667
epoch: 8, average loss: 0.2623496591, average correct: 90.0450000000
epoch: 9, average loss: 0.2603187489, average correct: 90.1150000000
epoch: 10, average loss: 0.2544594714, average correct: 90.2766666667
epoch: 11, average loss: 0.2514673166, average correct: 90.4633333333
epoch: 12, average loss: 0.2448574879, average correct: 90.6283333333
epoch: 13, average loss: 0.2418817339, average correct: 90.8016666667
epoch: 14, average loss: 0.2381014472, average correct: 91.0650000000
epoch: 15, average loss: 0.2359832367, average correct: 91.1016666667
epoch: 16, average loss: 0.2316297442, average correct: 91.2266666667
epoch: 17, average loss: 0.2267002342, average correct: 91.4033333333
epoch: 18, average loss: 0.2294180033, average correct: 91.3266666667
epoch: 19, average loss: 0.2253785118, average correct: 91.4433333333



  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss(difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.


3.3.Building a Confusion Matrix - Analyzing Results Part 1


关于什么是混淆矩阵(Confusion Matrix),这里简单介绍一下。


对于二分类问题,混淆矩阵通常是一个2×2的矩阵,包括真阳性(True Positive, TP)、真阴性(True Negative, TN)、假阳性(False Positive, FP)和假阴性(False Negative, FN)四个元素。


真实值=holothurian 真实值=echinus 真实值=starfish 真实值=scallop 真实值=background



python 复制代码
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim  # short for optimizer. This can give us access to the optimizer we will use to update weights.

import torchvision
import torchvision.transforms as transforms

def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()


train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = t.reshape(-1, 12 * 4 * 4)
        t = F.relu(self.fc1(t))
        t = F.relu(self.fc2(t))
        t = self.out(t)
        return t

network = Network()

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)    # step 1: Get batch from the training set.
optimizer = optim.Adam(network.parameters(), lr=0.005)

for epoch in range(5):
    total_loss = 0
    total_correct = 0

    for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
        images, labels = batch

        preds = network(images) # step 2: Pass batch to network.
        loss = F.cross_entropy(preds, labels)   # step 3: Calculate the loss(difference between the predicted values and the true values).

        loss.backward() # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
        optimizer.step()    # step 5: Update the weights using the gradients to reduce the loss.

        total_loss += loss.item()
        total_correct += get_num_correct(preds, labels)

    print(f"epoch: {epoch}, average loss: {total_loss/len(train_loader):.10f}, average correct: {total_correct/len(train_loader):.10f}")


python 复制代码
def get_all_preds(model, loader):
    all_preds = torch.tensor([])

    for batch in loader:
        images, labels = batch

        preds = model(images)
        all_preds = torch.cat((all_preds, preds), dim=0)

    return all_preds

prediction_loader = torch.utils.data.DataLoader(train_set, batch_size=10000)
train_preds = get_all_preds(network, prediction_loader)

print(f"train_set.targets.shape: {train_set.targets.shape}")
print(f"train_set.targets: {train_set.targets}")
print(f"train_preds.argmax(dim=1).shape: {train_preds.argmax(dim=1).shape}")
print(f"train_preds.argmax(dim=1): {train_preds.argmax(dim=1)}")


epoch: 0, average loss: 0.6142710297, average correct: 76.9300000000
epoch: 1, average loss: 0.4064236009, average correct: 85.1500000000
epoch: 2, average loss: 0.3550885973, average correct: 86.7516666667
epoch: 3, average loss: 0.3270902198, average correct: 87.8633333333
epoch: 4, average loss: 0.3086743477, average correct: 88.6150000000
train_set.targets.shape: 			torch.Size([60000])
train_set.targets: 					tensor([9, 0, 0,  ..., 3, 0, 5])
train_preds.argmax(dim=1).shape: 	torch.Size([60000])
train_preds.argmax(dim=1): 			tensor([9, 0, 0,  ..., 3, 0, 5])

可以看到刚刚的代码中有一段从来没见过的内容:@torch.no_grad()。关于这个语法,可以参考:【Python 高级特性】装饰器:不修改代码,就能改变函数功能的强大特性_哔哩哔哩_bilibili

关于关闭梯度,可以参考:28-CNN Confusion Matrix with PyTorch - Neural Network Programming_哔哩哔哩_bilibili,从08:30到12:46。



python 复制代码
stacked = torch.stack(

print(f"stacked.shape: {stacked.shape}")
print(f"stacked: {stacked}")
stacked.shape: 						torch.Size([60000, 2])
stacked: 							tensor([[9, 9],
                                            [0, 0],
                                            [0, 0],
                                            [3, 3],
                                            [0, 0],
                                            [5, 5]])

从矩阵的角度理解,可以看作是先合并再进行转置。stacked左侧一列是真实值,右侧一列是预测值。由于我们是10分类,因此混淆矩阵应该是 10 × 10 10×10 10×10的,因此构建一个 10 × 10 10×10 10×10的张量:

python 复制代码
cmt = torch.zeros(10, 10, dtype=torch.int32)
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=torch.int32)


python 复制代码
for p in stacked:
    tl, pl = p.tolist() # true label & predict label
    cmt[tl, pl] = cmt[tl, pl] + 1


其中,.tolist这个方法可以把某一个 1 × 2 1×2 1×2的张量拆分成两个数字。循环执行完60000次,就能拆分完stacked这个 60000 × 2 60000×2 60000×2的张量。结果如下:

tensor([[5111,    6,   32,  109,   15,    0,  670,    0,   57,    0],
        [   8, 5842,    3,  126,    3,    0,   12,    0,    6,    0],
        [  86,    1, 4084,   55,  856,    1,  826,    0,   91,    0],
        [ 159,   16,    7, 5495,  160,    0,  152,    0,   11,    0],
        [  13,    8,  160,  210, 4894,    0,  669,    0,   46,    0],
        [   1,    0,    0,    1,    0, 5651,    1,  187,   68,   91],
        [ 779,    7,  212,  108,  334,    1, 4464,    0,   95,    0],
        [   0,    0,    0,    0,    0,   17,    0, 5881,   26,   76],
        [  15,    1,    5,   19,   18,    1,   31,    2, 5908,    0],
        [   0,    0,    0,    0,    0,    7,    0,  323,    8, 5662]], dtype=torch.int32)

我们还可以把这个画成一张图,课程中给了个方法,也可以参考这篇博客:动手画混淆矩阵(Confusion Matrix)(含代码)_混淆矩阵怎么画-CSDN博客


python 复制代码
for p in stacked:
    tl, pl = p.tolist() # true label & predict label
    cmt[tl, pl] = cmt[tl, pl] + 1


def draw_confusion_matrix(label_true, label_pred, label_name, title="Confusion Matrix", pdf_save_path=None, dpi=100):

    @param label_true: 真实标签,比如[0,1,2,7,4,5,...]
    @param label_pred: 预测标签,比如[0,5,4,2,1,4,...]
    @param label_name: 标签名字,比如['cat','dog','flower',...]
    @param title: 图标题
    @param pdf_save_path: 是否保存,是则为保存路径pdf_save_path=xxx.png | xxx.pdf | ...等其他plt.savefig支持的保存格式
    @param dpi: 保存到文件的分辨率,论文一般要求至少300dpi

                          label_name=["Angry", "Disgust", "Fear", "Happy", "Sad", "Surprise", "Neutral"],
                          title="Confusion Matrix on Fer2013",

    cm = confusion_matrix(y_true=label_true, y_pred=label_pred, normalize='true')

    plt.imshow(cm, cmap='Blues')
    plt.xlabel("Predict label")
    plt.ylabel("Truth label")
    plt.yticks(range(label_name.__len__()), label_name)
    plt.xticks(range(label_name.__len__()), label_name, rotation=45)



    for i in range(label_name.__len__()):
        for j in range(label_name.__len__()):
            color = (1, 1, 1) if i == j else (0, 0, 0)  # 对角线字体白色,其他黑色
            value = float(format('%.2f' % cm[j, i]))
            plt.text(i, j, value, verticalalignment='center', horizontalalignment='center', color=color)

    if not pdf_save_path is None:
        plt.savefig(pdf_save_path, bbox_inches='tight', dpi=dpi)

                      label_name=["T-shirt", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle"],
                      title="Confusion Matrix on FashionMNIST",


python 复制代码
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix


pip install scikit-learn -i https://pypi.tuna.tsinghua.edu.cn/simple



3.4.Stack vs Concat - Deep Learning Tensor Ops


python 复制代码
stacked = torch.stack(



python 复制代码
import torch

t1 = torch.tensor([1,1,1,1])
t2 = torch.tensor([2,2,2,2])
t3 = torch.tensor([3,3,3,3])

# print(torch.cat((t1,t2,t3),dim=1))
# print(torch.cat((t1,t2,t3),dim=2))

# print(torch.stack((t1,t2,t3),dim=2))


tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3]])
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])


  • torch.cat((t1,t2,t3),dim=0)是直接横向拼接;
  • torch.stack((t1,t2,t3),dim=0)是换行拼接;
  • torch.stack((t1,t2,t3),dim=1)是先换行拼接再转置;

然后又讲了讲Tensorflow和Numpy中的操作,这里不再赘述。有兴趣的可以自己看一看:29-Stack vs Concat in PyTorch, TensorFlow & NumPy - Deep Learning Tensor Ops_哔哩哔哩_bilibili

3.5.Using TensorBoard with PyTorch - Analyzing Results Part 2

30-TensorBoard with PyTorch - Visualize Deep Learning Metrics_哔哩哔哩_bilibili


3.6.Hyperparameter Experimenting - Training Neural Networks


4.Section 4: Neural Network Experimentation


4.1.Custom Code - Neural Network Experimentation Code


python 复制代码
class RunBuilder():
    def get_runs(params):
        Run = namedtuple('Run', params.keys())

        runs = []
        for v in product(*params.values()):

        return runs

params = OrderedDict(lr = [.01,.005], batch_size = [100,500])

runs = RunBuilder.get_runs(params)


from collections import OrderedDict
from collections import namedtuple
from itertools import product


[Run(lr=0.01, batch_size=100), Run(lr=0.01, batch_size=500), Run(lr=0.005, batch_size=100), Run(lr=0.005, batch_size=500)]


训练序号 learning rate batch size
0 0.01 100
1 0.01 500
2 0.005 100
3 0.005 500


python 复制代码
params = OrderedDict(lr = [.01,.005], batch_size = [100,500], device = ["cuda","cpu"], epoch = [100,200])

runs = RunBuilder.get_runs(params)
for run in RunBuilder.get_runs(params):
    comment = f'-{run}'
-Run(lr=0.01, batch_size=100, device='cuda', epoch=100)
-Run(lr=0.01, batch_size=100, device='cuda', epoch=200)
-Run(lr=0.01, batch_size=100, device='cpu', epoch=100)
-Run(lr=0.01, batch_size=100, device='cpu', epoch=200)
-Run(lr=0.01, batch_size=500, device='cuda', epoch=100)
-Run(lr=0.01, batch_size=500, device='cuda', epoch=200)
-Run(lr=0.01, batch_size=500, device='cpu', epoch=100)
-Run(lr=0.01, batch_size=500, device='cpu', epoch=200)
-Run(lr=0.005, batch_size=100, device='cuda', epoch=100)
-Run(lr=0.005, batch_size=100, device='cuda', epoch=200)
-Run(lr=0.005, batch_size=100, device='cpu', epoch=100)
-Run(lr=0.005, batch_size=100, device='cpu', epoch=200)
-Run(lr=0.005, batch_size=500, device='cuda', epoch=100)
-Run(lr=0.005, batch_size=500, device='cuda', epoch=200)
-Run(lr=0.005, batch_size=500, device='cpu', epoch=100)
-Run(lr=0.005, batch_size=500, device='cpu', epoch=200)
4.2.Custom Code - Simultaneous Hyperparameter Testing

这节课主要讲了讲视频作者自己写的一个方法,目的是让训练部分的代码看起来更规整。而且代码在优化后,设置超参数可以更加清晰直观。如果有兴趣可以自己去看视频敲一遍代码,但是说实话视频里的方法过于复杂了,感觉意义不是很大。视频链接:33-CNN Training Loop Refactoring - Simultaneous Hyperparameter Testing_哔哩哔哩_bilibili


python 复制代码
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim  # short for optimizer. This can give us access to the optimizer we will use to update weights.

import torchvision
import torchvision.transforms as transforms

from collections import OrderedDict
from collections import namedtuple
from itertools import product

def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()

class RunBuilder:
    def get_runs(params):
        Run = namedtuple('Run', params.keys())
        runs = []
        for v in product(*params.values()):
        return runs


train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = t.reshape(-1, 12 * 4 * 4)
        t = F.relu(self.fc1(t))
        t = F.relu(self.fc2(t))
        t = self.out(t)
        return t

params = OrderedDict(
    lr=[.01, .005],
    batch_size=[100, 500],
    shuffle=[True, False],
    epoch=[5, 10],

for run in RunBuilder.get_runs(params):
    comment = f'-{run}'

for run in RunBuilder.get_runs(params):
    comment = f'-{run}'

    network = Network()

    train_loader = torch.utils.data.DataLoader(train_set, batch_size=run.batch_size, shuffle=run.shuffle)  # step 1: Get batch from the training set.
    optimizer = optim.Adam(network.parameters(), lr=run.lr)

    device = run.device

    for epoch in range(run.epoch):
        total_loss = 0
        total_correct = 0

        for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
            images, labels = batch
            images, labels = images.to(device), labels.to(device)

            preds = network(images)  # step 2: Pass batch to network.
            loss = F.cross_entropy(preds, labels)  # step 3: Calculate the loss(difference between the predicted values and the true values).

            loss.backward()  # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
            optimizer.step()  # step 5: Update the weights using the gradients to reduce the loss.

            total_loss += loss.item()
            total_correct += get_num_correct(preds, labels)

        print(f"epoch: {epoch}, average loss: {total_loss / len(train_loader):.10f}, average correct: {total_correct / len(train_loader):.10f}")


-Run(lr=0.01, batch_size=100, shuffle=True, epoch=5, device='cuda')
-Run(lr=0.01, batch_size=100, shuffle=True, epoch=10, device='cuda')
-Run(lr=0.01, batch_size=100, shuffle=False, epoch=5, device='cuda')
-Run(lr=0.01, batch_size=100, shuffle=False, epoch=10, device='cuda')
-Run(lr=0.01, batch_size=500, shuffle=True, epoch=5, device='cuda')
-Run(lr=0.01, batch_size=500, shuffle=True, epoch=10, device='cuda')
-Run(lr=0.01, batch_size=500, shuffle=False, epoch=5, device='cuda')
-Run(lr=0.01, batch_size=500, shuffle=False, epoch=10, device='cuda')
-Run(lr=0.005, batch_size=100, shuffle=True, epoch=5, device='cuda')
-Run(lr=0.005, batch_size=100, shuffle=True, epoch=10, device='cuda')
-Run(lr=0.005, batch_size=100, shuffle=False, epoch=5, device='cuda')
-Run(lr=0.005, batch_size=100, shuffle=False, epoch=10, device='cuda')
-Run(lr=0.005, batch_size=500, shuffle=True, epoch=5, device='cuda')
-Run(lr=0.005, batch_size=500, shuffle=True, epoch=10, device='cuda')
-Run(lr=0.005, batch_size=500, shuffle=False, epoch=5, device='cuda')
-Run(lr=0.005, batch_size=500, shuffle=False, epoch=10, device='cuda')

-Run(lr=0.01, batch_size=100, shuffle=True, epoch=5, device='cuda')
epoch: 0, average loss: 0.5920416600, average correct: 77.3633333333
epoch: 1, average loss: 0.4087048837, average correct: 84.6916666667
epoch: 2, average loss: 0.3676425663, average correct: 86.3216666667
epoch: 3, average loss: 0.3537376728, average correct: 86.8283333333
epoch: 4, average loss: 0.3422814391, average correct: 87.1783333333

-Run(lr=0.01, batch_size=100, shuffle=True, epoch=10, device='cuda')
epoch: 0, average loss: 0.5492223486, average correct: 79.1516666667
epoch: 1, average loss: 0.3799329826, average correct: 86.0050000000
epoch: 2, average loss: 0.3483632837, average correct: 87.2650000000
epoch: 3, average loss: 0.3277459771, average correct: 87.9200000000
epoch: 4, average loss: 0.3246190323, average correct: 88.0883333333
epoch: 5, average loss: 0.3123108984, average correct: 88.4750000000
epoch: 6, average loss: 0.3110108355, average correct: 88.7533333333
epoch: 7, average loss: 0.3071561582, average correct: 88.8133333333
epoch: 8, average loss: 0.2971438154, average correct: 88.9933333333
epoch: 9, average loss: 0.3020845860, average correct: 88.9533333333

-Run(lr=0.01, batch_size=100, shuffle=False, epoch=5, device='cuda')
epoch: 0, average loss: 0.5231307859, average correct: 80.5733333333
epoch: 1, average loss: 0.3721575903, average correct: 86.3266666667
epoch: 2, average loss: 0.3438053966, average correct: 87.2216666667
epoch: 3, average loss: 0.3264096088, average correct: 87.9566666667
epoch: 4, average loss: 0.3186341389, average correct: 88.1550000000

-Run(lr=0.01, batch_size=100, shuffle=False, epoch=10, device='cuda')
epoch: 0, average loss: 0.5866546286, average correct: 77.8483333333
epoch: 1, average loss: 0.3970952929, average correct: 85.3766666667
epoch: 2, average loss: 0.3621235035, average correct: 86.5833333333
epoch: 3, average loss: 0.3380681822, average correct: 87.5266666667
epoch: 4, average loss: 0.3294203746, average correct: 87.8300000000
epoch: 5, average loss: 0.3263481045, average correct: 87.8416666667
epoch: 6, average loss: 0.3185786353, average correct: 88.3183333333
epoch: 7, average loss: 0.3157054317, average correct: 88.3200000000
epoch: 8, average loss: 0.3133901746, average correct: 88.5316666667
epoch: 9, average loss: 0.3120996529, average correct: 88.5116666667

-Run(lr=0.01, batch_size=500, shuffle=True, epoch=5, device='cuda')
epoch: 0, average loss: 0.7892509239, average correct: 347.2750000000
epoch: 1, average loss: 0.4510333372, average correct: 416.3000000000
epoch: 2, average loss: 0.3713037593, average correct: 431.2583333333
epoch: 3, average loss: 0.3339587213, average correct: 437.6500000000
epoch: 4, average loss: 0.3173176558, average correct: 441.1083333333

-Run(lr=0.01, batch_size=500, shuffle=True, epoch=10, device='cuda')
epoch: 0, average loss: 0.7573039201, average correct: 354.0000000000
epoch: 1, average loss: 0.4501519705, average correct: 416.8416666667
epoch: 2, average loss: 0.3823612804, average correct: 429.3666666667
epoch: 3, average loss: 0.3457559437, average correct: 436.3666666667
epoch: 4, average loss: 0.3290857921, average correct: 438.6166666667
epoch: 5, average loss: 0.3094131328, average correct: 442.5833333333
epoch: 6, average loss: 0.2975341791, average correct: 444.1916666667
epoch: 7, average loss: 0.2881241533, average correct: 445.6333333333
epoch: 8, average loss: 0.2777063791, average correct: 447.7250000000
epoch: 9, average loss: 0.2732845400, average correct: 449.1750000000

-Run(lr=0.01, batch_size=500, shuffle=False, epoch=5, device='cuda')
epoch: 0, average loss: 0.7695579549, average correct: 353.5000000000
epoch: 1, average loss: 0.4608167340, average correct: 413.3000000000
epoch: 2, average loss: 0.3792866093, average correct: 429.6416666667
epoch: 3, average loss: 0.3401234890, average correct: 436.1416666667
epoch: 4, average loss: 0.3184403775, average correct: 439.4333333333

-Run(lr=0.01, batch_size=500, shuffle=False, epoch=10, device='cuda')
epoch: 0, average loss: 0.8925873876, average correct: 329.5416666667
epoch: 1, average loss: 0.4959731052, average correct: 408.4166666667
epoch: 2, average loss: 0.4143860998, average correct: 424.2666666667
epoch: 3, average loss: 0.3741927075, average correct: 431.2000000000
epoch: 4, average loss: 0.3524233627, average correct: 435.3083333333
epoch: 5, average loss: 0.3351472061, average correct: 437.7916666667
epoch: 6, average loss: 0.3246983716, average correct: 439.7833333333
epoch: 7, average loss: 0.3093644702, average correct: 442.6250000000
epoch: 8, average loss: 0.3039956290, average correct: 443.1333333333
epoch: 9, average loss: 0.2967160132, average correct: 444.8500000000

-Run(lr=0.005, batch_size=100, shuffle=True, epoch=5, device='cuda')
epoch: 0, average loss: 0.6199147650, average correct: 76.2400000000
epoch: 1, average loss: 0.3979324772, average correct: 85.2483333333
epoch: 2, average loss: 0.3547257959, average correct: 86.7416666667
epoch: 3, average loss: 0.3295163383, average correct: 87.7016666667
epoch: 4, average loss: 0.3138322780, average correct: 88.2683333333

-Run(lr=0.005, batch_size=100, shuffle=True, epoch=10, device='cuda')
epoch: 0, average loss: 0.6030920542, average correct: 77.0233333333
epoch: 1, average loss: 0.3808367471, average correct: 85.9150000000
epoch: 2, average loss: 0.3393732936, average correct: 87.4866666667
epoch: 3, average loss: 0.3147450837, average correct: 88.2833333333
epoch: 4, average loss: 0.3011448515, average correct: 88.7566666667
epoch: 5, average loss: 0.2857688793, average correct: 89.2666666667
epoch: 6, average loss: 0.2763760288, average correct: 89.6400000000
epoch: 7, average loss: 0.2677674725, average correct: 89.8633333333
epoch: 8, average loss: 0.2615216608, average correct: 90.0016666667
epoch: 9, average loss: 0.2529757225, average correct: 90.4666666667

-Run(lr=0.005, batch_size=100, shuffle=False, epoch=5, device='cuda')
epoch: 0, average loss: 0.6438212850, average correct: 75.6600000000
epoch: 1, average loss: 0.4311503610, average correct: 84.2750000000
epoch: 2, average loss: 0.3673974688, average correct: 86.3883333333
epoch: 3, average loss: 0.3370783451, average correct: 87.4566666667
epoch: 4, average loss: 0.3170995928, average correct: 88.1300000000

-Run(lr=0.005, batch_size=100, shuffle=False, epoch=10, device='cuda')
epoch: 0, average loss: 0.6074748931, average correct: 76.9666666667
epoch: 1, average loss: 0.3826813002, average correct: 85.7933333333
epoch: 2, average loss: 0.3360546749, average correct: 87.6816666667
epoch: 3, average loss: 0.3126871870, average correct: 88.4500000000
epoch: 4, average loss: 0.2962189508, average correct: 89.0433333333
epoch: 5, average loss: 0.2813491004, average correct: 89.4266666667
epoch: 6, average loss: 0.2728275024, average correct: 89.8183333333
epoch: 7, average loss: 0.2664490945, average correct: 89.9416666667
epoch: 8, average loss: 0.2537033156, average correct: 90.4100000000
epoch: 9, average loss: 0.2548490601, average correct: 90.4416666667

-Run(lr=0.005, batch_size=500, shuffle=True, epoch=5, device='cuda')
epoch: 0, average loss: 0.8625867729, average correct: 333.3500000000
epoch: 1, average loss: 0.5180763349, average correct: 403.0166666667
epoch: 2, average loss: 0.4419423719, average correct: 419.8916666667
epoch: 3, average loss: 0.3985803043, average correct: 427.3000000000
epoch: 4, average loss: 0.3671607743, average correct: 432.3166666667

-Run(lr=0.005, batch_size=500, shuffle=True, epoch=10, device='cuda')
epoch: 0, average loss: 0.8520629267, average correct: 340.1166666667
epoch: 1, average loss: 0.4742026764, average correct: 409.5500000000
epoch: 2, average loss: 0.3913912194, average correct: 427.4750000000
epoch: 3, average loss: 0.3563943662, average correct: 433.8916666667
epoch: 4, average loss: 0.3265873415, average correct: 439.5583333333
epoch: 5, average loss: 0.3126906040, average correct: 441.8333333333
epoch: 6, average loss: 0.2994723561, average correct: 444.2916666667
epoch: 7, average loss: 0.2842058177, average correct: 446.7250000000
epoch: 8, average loss: 0.2747114932, average correct: 448.5500000000
epoch: 9, average loss: 0.2636799593, average correct: 450.9416666667

-Run(lr=0.005, batch_size=500, shuffle=False, epoch=5, device='cuda')
epoch: 0, average loss: 0.8784389446, average correct: 333.3666666667
epoch: 1, average loss: 0.5245881908, average correct: 400.5166666667
epoch: 2, average loss: 0.4393276828, average correct: 420.1083333333
epoch: 3, average loss: 0.3916752331, average correct: 428.3000000000
epoch: 4, average loss: 0.3638193076, average correct: 432.9500000000

-Run(lr=0.005, batch_size=500, shuffle=False, epoch=10, device='cuda')
epoch: 0, average loss: 0.8776366348, average correct: 335.2666666667
epoch: 1, average loss: 0.5421455326, average correct: 397.2833333333
epoch: 2, average loss: 0.4568016176, average correct: 416.8500000000
epoch: 3, average loss: 0.4070300584, average correct: 426.6000000000
epoch: 4, average loss: 0.3699127187, average correct: 432.5750000000
epoch: 5, average loss: 0.3440336622, average correct: 436.9833333333
epoch: 6, average loss: 0.3256822807, average correct: 440.0166666667
epoch: 7, average loss: 0.3118922246, average correct: 442.4250000000
epoch: 8, average loss: 0.3020313072, average correct: 444.0000000000
epoch: 9, average loss: 0.2944910130, average correct: 445.3666666667




4.3.Data Loading - Deep Learning Speed Limit Increase


有一个参数是num_workers ,这个参数用来决定有几个工人参与到训练任务中。修改代码如下:

python 复制代码
params = OrderedDict(

def main():
    for run in RunBuilder.get_runs(params):
        comment = f'-{run}'

    for run in RunBuilder.get_runs(params):
        comment = f'-{run}'

        network = Network()

        train_loader = torch.utils.data.DataLoader(train_set, batch_size=run.batch_size, shuffle=run.shuffle, num_workers=run.num_workers)  # step 1: Get batch from the training set.
        optimizer = optim.Adam(network.parameters(), lr=run.lr)

        device = run.device

        train_start_time = time.time()
        for epoch in range(run.epoch):
            total_loss = 0
            total_correct = 0

            for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
                images, labels = batch
                images, labels = images.to(device), labels.to(device)

                preds = network(images)  # step 2: Pass batch to network.
                loss = F.cross_entropy(preds, labels)  # step 3: Calculate the loss(difference between the predicted values and the true values).

                loss.backward()  # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
                optimizer.step()  # step 5: Update the weights using the gradients to reduce the loss.

                total_loss += loss.item()
                total_correct += get_num_correct(preds, labels)

            print(f"epoch: {epoch}, average loss: {total_loss / len(train_loader):.10f}, average correct: {total_correct / len(train_loader):.10f}")
        train_end_time = time.time()
        print(f"train time: {train_end_time - train_start_time:.10f}")

if __name__ == '__main__':

为了避免windows在多进程时出现错误,需要把多进程的逻辑放入main()方法中。此外,我还加了一个计算训练时间的小逻辑,需要额外import time。使用Git看一下差异:


-Run(lr=0.01, batch_size=100, shuffle=True, epoch=5, device='cuda', num_workers=0)
epoch: 0, average loss: 0.5889489330, average correct: 77.6366666667
epoch: 1, average loss: 0.3963580837, average correct: 85.3083333333
epoch: 2, average loss: 0.3774351805, average correct: 86.1550000000
epoch: 3, average loss: 0.3641082152, average correct: 86.5716666667
epoch: 4, average loss: 0.3537526523, average correct: 86.9150000000
train time: 25.0075454712

-Run(lr=0.01, batch_size=100, shuffle=True, epoch=5, device='cuda', num_workers=1)
epoch: 0, average loss: 0.5542732583, average correct: 78.9266666667
epoch: 1, average loss: 0.3714919458, average correct: 86.3066666667
epoch: 2, average loss: 0.3456784629, average correct: 87.3250000000
epoch: 3, average loss: 0.3339744434, average correct: 87.8033333333
epoch: 4, average loss: 0.3226663608, average correct: 88.0750000000
train time: 43.3679795265

-Run(lr=0.01, batch_size=100, shuffle=True, epoch=5, device='cuda', num_workers=2)
epoch: 0, average loss: 0.5282431153, average correct: 80.0533333333
epoch: 1, average loss: 0.3752922985, average correct: 85.9616666667
epoch: 2, average loss: 0.3499810995, average correct: 86.8783333333
epoch: 3, average loss: 0.3324256434, average correct: 87.7316666667
epoch: 4, average loss: 0.3266595144, average correct: 87.8866666667
train time: 39.0620391369

-Run(lr=0.01, batch_size=100, shuffle=True, epoch=5, device='cuda', num_workers=4)
epoch: 0, average loss: 0.5689649721, average correct: 78.4050000000
epoch: 1, average loss: 0.3926167285, average correct: 85.4866666667
epoch: 2, average loss: 0.3640779927, average correct: 86.4366666667
epoch: 3, average loss: 0.3475985242, average correct: 87.1316666667
epoch: 4, average loss: 0.3318750289, average correct: 87.7633333333
train time: 39.0912690163

-Run(lr=0.01, batch_size=100, shuffle=True, epoch=5, device='cuda', num_workers=8)
epoch: 0, average loss: 0.5755973245, average correct: 78.3383333333
epoch: 1, average loss: 0.3899452185, average correct: 85.6650000000
epoch: 2, average loss: 0.3616473807, average correct: 86.6100000000
epoch: 3, average loss: 0.3443661022, average correct: 87.1650000000
epoch: 4, average loss: 0.3380884284, average correct: 87.5900000000
train time: 48.0404033661

-Run(lr=0.01, batch_size=100, shuffle=True, epoch=5, device='cuda', num_workers=16)
epoch: 0, average loss: 0.5969337454, average correct: 77.3150000000
epoch: 1, average loss: 0.4035679932, average correct: 85.0200000000
epoch: 2, average loss: 0.3586135743, average correct: 86.8533333333
epoch: 3, average loss: 0.3403906616, average correct: 87.4583333333
epoch: 4, average loss: 0.3330876943, average correct: 87.8000000000
train time: 68.1281514168

可以看到,按训练速度排序,依次为num_workers=0, 2, 4, 1, 8, 16。这就很奇怪了,与课程完全不符。根据视频,理想情况下应该是num_workers=1时最快:一个worker在进行forward和backward的时候,另一个worker去加载数据,两个人如此往复应该效率非常高;如果num_workers=0,那一个人不如俩人并行干活快;如果num_workers多于1,那应该总有人闲着没事干。为什么会出现这个情况呢?


  1. CPU 和 GPU 资源竞争

在使用 num_workers > 0 时,PyTorch 使用了多进程来并行加载数据,这会增加 CPU 的负担。特别是在 GPU 和 CPU 资源有限的情况下,数据加载可能会成为瓶颈。

  • GPU 并行计算与 CPU 数据加载的竞争:虽然 GPU 可以加速计算,但在多线程/多进程加载数据时,CPU 会变得繁忙,特别是当数据预处理比较复杂或者 CPU 核心不够时。多个 worker 在处理数据时需要不断与主进程进行通信,从而增加了额外的开销。
  • IO瓶颈 :如果你使用的是磁盘(而非 SSD)读取数据,IO 会成为瓶颈。当 num_workers 增加时,多个进程可能会同时进行磁盘访问,导致 IO 操作更加拥塞,从而导致性能下降。
  1. 过多的进程导致调度开销

当你使用多个 worker 时,PyTorch 会为每个 worker 创建一个进程来加载数据。每个进程都需要一些时间来启动并管理。增加 worker 数量的同时也意味着每个进程在进行通信、同步等操作时所需的开销会显著增加,导致性能变差。

  • 进程管理和通信 :在多进程模式下,每个 worker 之间需要进行数据传递和同步,特别是在 num_workers 较高时,Python 进程间的管理开销会增加。这个开销可能会抵消数据加载的加速效应。
  • Windows的multiprocessing问题 :在 Windows 系统中,multiprocessing 会带来额外的开销。Python 需要在每个进程中重新导入模块,可能导致性能下降。
  1. 数据加载的速度与网络带宽限制

如果你的数据集较小或读取速度较快,当 num_workers 过高时,可能会有多余的进程在等待数据加载或过多的进程在闲置状态,导致 CPU 的过度切换和内存缓存溢出等问题。

  1. 内存带宽或缓存问题

使用多个 worker 时,数据的加载和处理可能会频繁占用内存,特别是当数据集非常大时。内存带宽可能成为瓶颈,导致缓存失效和数据的等待,最终拖慢整体的训练速度。



Device num_workers=0 num_workers=1 num_workers=2 num_workers=4 num_workers=8 num_workers=16
CPU 22.7454195023 38.8810143471 39.3634419441 39.1821424961 50.9369196892 66.9246480465
GPU 21.2091612816 37.8195934296 33.8372309208 33.2044763565 41.2324335575 55.7895495892

这一次按训练速度排序,依次为num_workers=0, 4, 2, 1, 8, 16。说明GPT多半在胡扯,这下完全解释不通了。

用于对照,我又租了云服务器,看看用满血RTX 4090(24GB)训练的结果如何。为了节省时间,只测试GPU,结果如下:

Device num_workers=0 num_workers=1 num_workers=2 num_workers=4 num_workers=8 num_workers=16
GPU 19.8094530106 18.6171827316 10.8715770245 10.2341358662 11.0814673901 12.2236852646

按训练速度排序,依次为num_workers=4, 2, 8, 16, 1, 0。这既不是视频预测的顺序,也不是刚刚在本地实测的顺序,又不是升序,又不是降序,让我很难办啊...


dv lr bs ep num=0 num=1 num=2 num=4 num=8 num=16
GPU 0.01 100 5 19.8094530106 18.6171827316 10.8715770245 10.2341358662 11.0814673901 12.2236852646
GPU 0.01 100 10 39.1413171291 37.4858777523 22.0164945126 20.8994820118 22.1796212196 23.8738613129
GPU 0.01 100 15 57.7995896339 54.7732126713 31.0987980366 30.4089231491 32.7723743916 35.4690580368
GPU 0.01 100 20 77.1752855778 75.3958535194 41.2066097260 41.6991536617 44.0119616985 48.2091443539
GPU 0.01 200 5 16.7913942337 16.7396118641 9.9124171734 5.8958966732 5.9714968204 7.0907850266
GPU 0.01 200 10 31.5285310745 34.1176443100 18.5157856941 11.9089007378 12.8334674835 14.0546066761
GPU 0.01 200 15 47.4661595821 51.3070929050 27.9936087132 17.6425700188 19.1162984371 22.1769638062
GPU 0.01 200 20 64.2943732738 67.6741607189 38.3087828159 23.1501190662 26.6771740913 38.7755141258

其中,dv是Device,lr是learning rate,bs是batch size,ep是epoch,num都指num_workers。

先不管本地,看一看云服务器。虽然既不是升序又不是降序,但是很明显,除了在learning rate=0.01、batch size=100、epoch=20这个条件下之外,其他的几次训练都是在num_workers=4之前时间依次递减,在num_workers=4之后时间一次递增。虽然不清楚原因,但是现象确实是这样的。如果有读者能够指出这种现象的原因,笔者不胜感激。

在此,我先后有两个猜想,不过经过对照实验均已证伪。这里只是放一下,完全可以不看。猜想1:有没有可能是对服务器而言,读取learn_loader的速度远远慢于训练5个epoch的速度,且这个速度大约差4倍。如果有num_workers=4,即,以类似5级(4+1)流水线的方式工作,恰好满足4个工人读取learn_loader,1个工人进行forward和backward,此时工作效率最高;如果num_workers<4,则此时活干不完;如果num_workers>4,则此时有工人总闲着没事干。但是我回看了代码,想起测量的这个时间本身就不包含读取learn_loader,所以肯定不对。不过我也实测了一下,不论是本地还是服务器,读取learn_loader的时间均在1ms以内,完全不存在读取learn_loader耗费很多时间的可能。猜想2:会不会在num_workers<4的时候,没有完全利用完GPU的资源;在num_workers>4的时候,GPU的资源已经被利用到100%,所以会随着num_workers的增加,运行负担越来越高,从而导致训练时间越来越长。为了验证,我们将learning rate设置为0.01,batch size设置为100,epoch设置为5,只改变num_workers的值。同时,在服务器中打开AutoPanel,监控GPU的使用率。然而发现,不论是GPU、CPU、显存、内存都没有跑满,甚至冗余很多。非常之奇怪啊。


附一下我做的其他测试,其中,RTX 4090(24GB)、RTX 3090(24GB)代表在云服务器进行的测试,RTX 4070(16GB) Laptop代表在本地进行的测试:

dv lr bs ep num=0 num=1 num=2 num=4 num=8 num=16
RTX 4070(16GB) Laptop 0.01 100 5 28.1305768490 41.7332050800 36.2673246861 34.3168334961 39.9641251564 57.6469037533
RTX 4070(16GB) Laptop 0.01 100 10 57.8800597191 81.7754955292 73.1815102100 65.6602911949 77.4903655052 118.0588462353
RTX 4070(16GB) Laptop 0.01 100 15 99.5685005188 130.0420460701 111.5567693710 107.4449329376 132.9369020462 182.7950754166
RTX 4070(16GB) Laptop 0.01 100 20 141.9266636372 170.4486801624 159.7556915283 146.8164608479 172.9068100452 230.8282351494
RTX 3090(24GB) 0.01 100 5 29.5825061798 26.8633742332 17.8401007652 19.9198176861 19.0827801228 21.7736520767
RTX 3090(24GB) 0.01 100 10 58.3652188778 53.4312276840 36.3988366127 35.5191409588 36.3807988167 38.4931862354
RTX 3090(24GB) 0.01 100 15 88.5150232315 79.5371186733 59.6383686066 53.8375451565 54.5288057327 63.0430862904
RTX 3090(24GB) 0.01 100 20 117.6886110306 105.9683313370 83.1662738323 83.4657490253 83.4540984631 87.3529927731
RTX 3090(24GB) 0.01 200 5 25.3283810616 24.0916061401 13.5204551220 12.0049178600 11.2450921535 12.2135033607
RTX 3090(24GB) 0.01 200 10 48.6335105896 48.8655605316 26.1286144257 21.2161135674 21.9887542725 23.7948474884
RTX 3090(24GB) 0.01 200 15 72.6389396191 73.0067055225 38.1936340332 33.8487656116 30.4887382984 33.7346482277
RTX 3090(24GB) 0.01 200 20 97.3576982021 96.1663630009 54.2182519436 43.7375555038 43.9286410809 50.3938348293
RTX 4090(24GB) 0.01 100 5 19.8094530106 18.6171827316 10.8715770245 10.2341358662 11.0814673901 12.2236852646
RTX 4090(24GB) 0.01 100 10 39.1413171291 37.4858777523 22.0164945126 20.8994820118 22.1796212196 23.8738613129
RTX 4090(24GB) 0.01 100 15 57.7995896339 54.7732126713 31.0987980366 30.4089231491 32.7723743916 35.4690580368
RTX 4090(24GB) 0.01 100 20 77.1752855778 75.3958535194 41.2066097260 41.6991536617 44.0119616985 48.2091443539
RTX 4090(24GB) 0.01 200 5 16.7913942337 16.7396118641 9.9124171734 5.8958966732 5.9714968204 7.0907850266
RTX 4090(24GB) 0.01 200 10 31.5285310745 34.1176443100 18.5157856941 11.9089007378 12.8334674835 14.0546066761
RTX 4090(24GB) 0.01 200 15 47.4661595821 51.3070929050 27.9936087132 17.6425700188 19.1162984371 22.1769638062
RTX 4090(24GB) 0.01 200 20 64.2943732738 67.6741607189 38.3087828159 23.1501190662 26.6771740913 38.7755141258
RTX 4090(24GB) 0.01 100 100 394.7647271156 390.7899484634 218.7683389187 212.3276288509 221.9496636391 242.6673545837
RTX 4090(24GB) 0.01 100 150 552.9656944275 536.8140552044 308.1492588520 312.5498087406 330.0487678051 375.0175244808
RTX 4090(24GB) 0.01 100 200 739.6282594204 733.0374395847 405.3113956451 418.5048103333 437.4574213028 477.0259277821
RTX 4090(24GB) 0.01 200 100 338.4314348698 359.8767967224 199.4675915241 121.1487936974 136.2409780025 173.8709726334
RTX 4090(24GB) 0.01 200 150 472.6995038986 499.8453207016 282.8549370766 189.5168559551 200.7765417099 236.5929512978
RTX 4090(24GB) 0.01 200 200 675.4776511192 721.8856656551 395.7786476612 243.3258059025 263.1972572803 310.2211456299
4.4.On the GPU - Training Neural Networks with CUDA


4.5.Data Normalization - Normalize a Dataset


注意 :本节课中代码涉及到num_workers的设置,如果是在windows上运行需要删掉,或者把这一部分放到main函数中


python 复制代码
import torch

import torchvision
import torchvision.transforms as transforms

import matplotlib.pyplot as plt

train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([


python 复制代码
# Easy way: Calculate the mean and standard deviation using the torch method
loader = torch.utils.data.DataLoader(train_set, batch_size=len(train_set), num_workers=4)
data = next(iter(loader))
print(data[0].mean(), data[0].std())

# Harder way: Calculate the mean and standard deviation by hands
loader = torch.utils.data.DataLoader(train_set, batch_size=len(train_set), num_workers=4)
num_of_pixels = len(train_set) * 28 * 28

total_sum = 0
for batch in loader:
    total_sum += batch[0].sum()
mean = total_sum / num_of_pixels

sum_of_squared_error = 0
for batch in loader:
    sum_of_squared_error += ((batch[0] - mean) ** 2).sum()
std = torch.sqrt(sum_of_squared_error / num_of_pixels)

print(mean, std)

注意 :这里设置了num_workers=4,如果是在windows上运行需要删掉这一部分,或者把这一部分放到main函数中!在代码的其他部分同样也要注意这个问题,不再反复提示!


tensor(0.2860) tensor(0.3530)
tensor(0.2860) tensor(0.3530)


python 复制代码

需要注意data[0]的大小,这是一个60000*1*28*28的张量。横坐标表示具体的值(由于FashionMNIST数据集是灰度图,所以每一个value都在0-1之间),纵坐标表示个数( × 1 0 7 ×10^7 ×107),竖线是平均值。


python 复制代码
train_set_normal = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([
        # normalize
        transforms.Normalize(data[0].mean(), data[0].std())

# Easy way: Calculate the mean and standard deviation using the torch method
loader_normal = torch.utils.data.DataLoader(train_set_normal, batch_size=len(train_set_normal), num_workers=4)
data_normal = next(iter(loader_normal))
print(data_normal[0].mean(), data_normal[0].std())



tensor(0.2860) tensor(0.3530)
tensor(-2.9064e-08) tensor(1.)




python 复制代码
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim  # short for optimizer. This can give us access to the optimizer we will use to update weights.

import torchvision
import torchvision.transforms as transforms

from collections import OrderedDict
from collections import namedtuple
from itertools import product

def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()

class RunBuilder:
    def get_runs(params):
        Run = namedtuple('Run', params.keys())
        runs = []
        for v in product(*params.values()):
        return runs


train_set_not_normal = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([
        # normalize


# Easy way: Calculate the mean and standard deviation using the torch method
loader = torch.utils.data.DataLoader(train_set_not_normal, batch_size=len(train_set_not_normal), num_workers=4)
data = next(iter(loader))

train_set_normal = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([
        # normalize
        transforms.Normalize((data[0].mean()), (data[0].std()))

# Easy way: Calculate the mean and standard deviation using the torch method
loader_normal = torch.utils.data.DataLoader(train_set_normal, batch_size=len(train_set_normal), num_workers=4)
data_normal = next(iter(loader_normal))

class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = t.reshape(-1, 12 * 4 * 4)
        t = F.relu(self.fc1(t))
        t = F.relu(self.fc2(t))
        t = self.out(t)
        return t

params = OrderedDict(
    train_set=[train_set_not_normal, train_set_normal],

def main():
    for run in RunBuilder.get_runs(params):
        comment = f'-{run}'

    for run in RunBuilder.get_runs(params):
        comment = f'-{run}'

        network = Network()

        reading_start_time = time.time()
        train_loader = torch.utils.data.DataLoader(dataset=run.train_set,
                                                   num_workers=run.num_workers)  # step 1: Get batch from the training set.
        optimizer = optim.Adam(network.parameters(), lr=run.lr)

        device = run.device
        reading_end_time = time.time()
        # print(f"reading train loader time: {reading_end_time - reading_start_time:.10f}")

        train_start_time = time.time()
        for epoch in range(run.epoch):  # step 7: Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.
            total_loss = 0
            total_correct = 0

            for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
                images, labels = batch
                images, labels = images.to(device), labels.to(device)

                preds = network(images)  # step 2: Pass batch to network.
                loss = F.cross_entropy(preds, labels)  # step 3: Calculate the loss(difference between the predicted values and the true values).

                loss.backward()  # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
                optimizer.step()  # step 5: Update the weights using the gradients to reduce the loss.

                total_loss += loss.item()
                total_correct += get_num_correct(preds, labels)

                f"epoch: {epoch}, average loss: {total_loss / len(train_loader):.10f}, average correct: {total_correct / len(train_loader):.10f}")
        train_end_time = time.time()
        print(f"train time: {train_end_time - train_start_time:.10f}")

if __name__ == '__main__':




python 复制代码
def main():
    # for run in RunBuilder.get_runs(params):
    #     comment = f'-{run}'
    #     print(comment)

    results = []

    for run in RunBuilder.get_runs(params):
        comment = f'-{run}'

        network = Network()

        reading_start_time = time.time()
        train_loader = torch.utils.data.DataLoader(dataset=run.train_set,
                                                   num_workers=run.num_workers)  # step 1: Get batch from the training set.
        optimizer = optim.Adam(network.parameters(), lr=run.lr)

        device = run.device
        reading_end_time = time.time()
        # print(f"reading train loader time: {reading_end_time - reading_start_time:.10f}")

        train_start_time = time.time()
        for epoch in range(run.epoch):  # step 7: Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.
            total_loss = 0
            total_correct = 0

            for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
                images, labels = batch
                images, labels = images.to(device), labels.to(device)

                preds = network(images)  # step 2: Pass batch to network.
                loss = F.cross_entropy(preds, labels)  # step 3: Calculate the loss(difference between the predicted values and the true values).

                loss.backward()  # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
                optimizer.step()  # step 5: Update the weights using the gradients to reduce the loss.

                total_loss += loss.item()
                total_correct += get_num_correct(preds, labels)

            print(f"epoch: {epoch:3d}, average loss: {total_loss / len(train_loader):.10f}, average correct: {total_correct / len(train_loader):.10f}")

                'epoch': epoch,
                'average_loss': (total_loss / len(train_loader)),
                'average_correct': (total_correct / len(train_loader)),
                'train_set': 'not normal' if run.train_set == train_set_not_normal else 'normal'

        train_end_time = time.time()
        # print(f"train time: {train_end_time - train_start_time:.10f}")

    sorted_results = sorted(results, key=lambda x: x['average_correct'], reverse=True)

    print("\nSorted Results (by average correct):")
    for result in sorted_results:
        print(f"epoch: {result['epoch']}, average loss: {result['average_loss']:.10f}, average correct: {result['average_correct']:.10f}, train_set: {result['train_set']}")


Sorted Results (by average correct):
epoch:  89, average loss: 0.2444642815, average correct: 91.3700000000, train_set: not normal
epoch:  94, average loss: 0.2436240145, average correct: 91.3250000000, train_set: not normal
epoch:  86, average loss: 0.2446894638, average correct: 91.2366666667, train_set: not normal
epoch:  95, average loss: 0.2503509281, average correct: 91.1933333333, train_set: not normal
epoch:  98, average loss: 0.2522871471, average correct: 91.1616666667, train_set: not normal
epoch:  93, average loss: 0.2487666017, average correct: 91.1300000000, train_set: not normal
epoch:  60, average loss: 0.2588673048, average correct: 91.0916666667, train_set: normal
epoch:  96, average loss: 0.2566552062, average correct: 91.0883333333, train_set: not normal
epoch:  88, average loss: 0.2468521763, average correct: 91.0700000000, train_set: not normal
epoch:  83, average loss: 0.2509987283, average correct: 91.0316666667, train_set: not normal
epoch:  92, average loss: 0.2598962237, average correct: 90.9683333333, train_set: not normal
epoch:  81, average loss: 0.2532719481, average correct: 90.9550000000, train_set: not normal
epoch:  70, average loss: 0.2449382489, average correct: 90.9533333333, train_set: not normal
epoch:  87, average loss: 0.2578270657, average correct: 90.9316666667, train_set: not normal
epoch:  90, average loss: 0.2609930170, average correct: 90.8933333333, train_set: not normal
epoch:  69, average loss: 0.2517988028, average correct: 90.8916666667, train_set: not normal
epoch:  67, average loss: 0.2641334045, average correct: 90.8883333333, train_set: normal
epoch:  59, average loss: 0.2651573863, average correct: 90.8733333333, train_set: normal
epoch:  76, average loss: 0.2493317663, average correct: 90.8700000000, train_set: not normal
epoch:  99, average loss: 0.2645902622, average correct: 90.8566666667, train_set: not normal
epoch:  82, average loss: 0.2623617617, average correct: 90.8383333333, train_set: not normal
epoch:  97, average loss: 0.2615160028, average correct: 90.8366666667, train_set: not normal

看样子是白忙活了,起码在learning rate=0.01,batch size=100,epoch=100时,使用原始数据集训练出来的效果更好,与原视频的结论正好相反。不过原视频中也提到对于不同的原始数据集、不同的网络结构、不同的超参数,使用标准化的数据集的训练效果和使用原始数据集的训练效果不一定哪个更好,需要具体实验才能得知。


python 复制代码
params = OrderedDict(
    train_set=[train_set_not_normal, train_set_normal],

然后把average correct这个参数换成了accuracy,main函数如下:

python 复制代码
def main():
    # for run in RunBuilder.get_runs(params):
    #     comment = f'-{run}'
    #     print(comment)

    results = []

    for run in RunBuilder.get_runs(params):
        comment = f'-{run}'

        network = Network()

        reading_start_time = time.time()
        train_loader = torch.utils.data.DataLoader(dataset=run.train_set,
                                                   num_workers=run.num_workers)  # step 1: Get batch from the training set.
        optimizer = optim.Adam(network.parameters(), lr=run.lr)

        device = run.device
        reading_end_time = time.time()
        # print(f"reading train loader time: {reading_end_time - reading_start_time:.10f}")

        train_start_time = time.time()
        for epoch in range(run.epoch):  # step 7: Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.
            total_loss = 0
            total_correct = 0

            for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
                images, labels = batch
                images, labels = images.to(device), labels.to(device)

                preds = network(images)  # step 2: Pass batch to network.
                loss = F.cross_entropy(preds, labels)  # step 3: Calculate the loss(difference between the predicted values and the true values).

                loss.backward()  # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
                optimizer.step()  # step 5: Update the weights using the gradients to reduce the loss.

                total_loss += loss.item()
                total_correct += get_num_correct(preds, labels)

            print(f"epoch: {epoch:3d}, average loss: {total_loss / len(train_loader):.10f}, average correct: {total_correct / len(train_loader):.10f}")

                'epoch': epoch,
                'lr': run.lr,
                'batch_size': run.batch_size,
                'average_loss': (total_loss / len(train_loader)),
                'accuracy': (total_correct / len(train_loader) / run.batch_size),
                'train_set': 'not normal' if run.train_set == train_set_not_normal else 'normal'

        train_end_time = time.time()
        print(f"train time: {train_end_time - train_start_time:.10f}")

    sorted_results = sorted(results, key=lambda x: x['accuracy'], reverse=True)

    print("\nSorted Results (by average correct):")
    for result in sorted_results:
        print(f"epoch: {result['epoch']:3d}, learning rate: {result['lr']:.3f}, batch size: {result['batch_size']}, average loss: {result['average_loss']:.10f}, accuracy: {result['accuracy']:.4f}, train_set: {result['train_set']}")


Sorted Results (by average correct):
epoch: 174, learning rate: 0.005, batch size: 200, average loss: 0.0709776129, accuracy: 0.9743, train_set: not normal
epoch: 193, learning rate: 0.005, batch size: 200, average loss: 0.0774264681, accuracy: 0.9724, train_set: not normal
epoch: 192, learning rate: 0.005, batch size: 200, average loss: 0.0803019308, accuracy: 0.9714, train_set: not normal
epoch: 189, learning rate: 0.005, batch size: 200, average loss: 0.0776055988, accuracy: 0.9713, train_set: not normal
epoch: 184, learning rate: 0.005, batch size: 200, average loss: 0.0794412048, accuracy: 0.9712, train_set: not normal
epoch: 184, learning rate: 0.005, batch size: 200, average loss: 0.0787860808, accuracy: 0.9711, train_set: normal
epoch: 180, learning rate: 0.005, batch size: 200, average loss: 0.0827256603, accuracy: 0.9709, train_set: not normal
epoch: 199, learning rate: 0.005, batch size: 200, average loss: 0.0837867468, accuracy: 0.9709, train_set: normal
epoch: 178, learning rate: 0.005, batch size: 200, average loss: 0.0792716932, accuracy: 0.9707, train_set: not normal
epoch: 188, learning rate: 0.005, batch size: 200, average loss: 0.0819608879, accuracy: 0.9707, train_set: not normal
epoch: 182, learning rate: 0.005, batch size: 200, average loss: 0.0838116156, accuracy: 0.9705, train_set: not normal
epoch: 155, learning rate: 0.005, batch size: 200, average loss: 0.0803890361, accuracy: 0.9704, train_set: not normal
epoch: 176, learning rate: 0.005, batch size: 200, average loss: 0.0855941821, accuracy: 0.9699, train_set: normal
epoch: 169, learning rate: 0.005, batch size: 200, average loss: 0.0840451158, accuracy: 0.9699, train_set: not normal
epoch: 181, learning rate: 0.005, batch size: 200, average loss: 0.0859571979, accuracy: 0.9699, train_set: normal
epoch: 197, learning rate: 0.005, batch size: 200, average loss: 0.0866671357, accuracy: 0.9698, train_set: not normal
epoch: 162, learning rate: 0.005, batch size: 200, average loss: 0.0801894657, accuracy: 0.9698, train_set: not normal
epoch: 170, learning rate: 0.005, batch size: 200, average loss: 0.0910011409, accuracy: 0.9695, train_set: normal
epoch: 189, learning rate: 0.005, batch size: 200, average loss: 0.0871327942, accuracy: 0.9695, train_set: normal
epoch: 183, learning rate: 0.005, batch size: 200, average loss: 0.0858964958, accuracy: 0.9694, train_set: not normal
epoch: 177, learning rate: 0.005, batch size: 200, average loss: 0.0875864697, accuracy: 0.9692, train_set: not normal
epoch: 185, learning rate: 0.005, batch size: 200, average loss: 0.0897819514, accuracy: 0.9691, train_set: normal
epoch: 194, learning rate: 0.005, batch size: 200, average loss: 0.0898394302, accuracy: 0.9690, train_set: normal
epoch: 152, learning rate: 0.005, batch size: 200, average loss: 0.0865142823, accuracy: 0.9689, train_set: normal
epoch: 198, learning rate: 0.005, batch size: 200, average loss: 0.0892004568, accuracy: 0.9689, train_set: normal
epoch: 163, learning rate: 0.005, batch size: 200, average loss: 0.0899946193, accuracy: 0.9689, train_set: not normal
epoch: 181, learning rate: 0.005, batch size: 100, average loss: 0.1554519005, accuracy: 0.9448, train_set: not normal
epoch:  91, learning rate: 0.005, batch size: 100, average loss: 0.1557571931, accuracy: 0.9447, train_set: normal
epoch: 187, learning rate: 0.005, batch size: 100, average loss: 0.1759182699, accuracy: 0.9447, train_set: normal
epoch:  50, learning rate: 0.005, batch size: 200, average loss: 0.1488474063, accuracy: 0.9446, train_set: normal
epoch: 143, learning rate: 0.005, batch size: 100, average loss: 0.1635714449, accuracy: 0.9446, train_set: normal
epoch: 164, learning rate: 0.005, batch size: 100, average loss: 0.1574821307, accuracy: 0.9446, train_set: not normal
epoch: 137, learning rate: 0.005, batch size: 100, average loss: 0.1480821254, accuracy: 0.9445, train_set: not normal
epoch:  42, learning rate: 0.005, batch size: 200, average loss: 0.1468473995, accuracy: 0.9445, train_set: not normal
epoch: 168, learning rate: 0.005, batch size: 100, average loss: 0.1608339588, accuracy: 0.9445, train_set: not normal
epoch: 186, learning rate: 0.005, batch size: 100, average loss: 0.1570072239, accuracy: 0.9445, train_set: not normal
epoch:  98, learning rate: 0.005, batch size: 100, average loss: 0.1534497880, accuracy: 0.9443, train_set: not normal
epoch: 187, learning rate: 0.010, batch size: 200, average loss: 0.1552866464, accuracy: 0.9443, train_set: not normal
epoch: 138, learning rate: 0.005, batch size: 100, average loss: 0.1566563757, accuracy: 0.9443, train_set: not normal
epoch:  90, learning rate: 0.005, batch size: 100, average loss: 0.1541579747, accuracy: 0.9442, train_set: normal
epoch: 121, learning rate: 0.005, batch size: 100, average loss: 0.1533592629, accuracy: 0.9442, train_set: not normal

可见learning rate=0.005,batch size=200的这组赢麻了,其次learning rate=0.005,batch size=200,而直到非常靠下的位置才能看到一个learning rate=0.01的。而且普遍来看,还是未经过标准化的数据训练效果比较好。



Sorted Results (by average correct):
epoch: 176, learning rate: 0.005, batch size: 200, average loss: 0.0765673242, accuracy: 0.9734, train_set: normal
epoch: 194, learning rate: 0.005, batch size: 200, average loss: 0.0793302470, accuracy: 0.9717, train_set: normal
epoch: 185, learning rate: 0.005, batch size: 200, average loss: 0.0785266966, accuracy: 0.9716, train_set: normal
epoch: 199, learning rate: 0.005, batch size: 200, average loss: 0.0830905189, accuracy: 0.9713, train_set: normal
epoch: 198, learning rate: 0.005, batch size: 200, average loss: 0.0838561028, accuracy: 0.9704, train_set: normal
epoch: 162, learning rate: 0.005, batch size: 200, average loss: 0.0835923307, accuracy: 0.9702, train_set: normal
epoch: 187, learning rate: 0.005, batch size: 200, average loss: 0.0853577284, accuracy: 0.9701, train_set: normal
epoch: 190, learning rate: 0.005, batch size: 200, average loss: 0.0867463920, accuracy: 0.9699, train_set: normal
epoch: 178, learning rate: 0.005, batch size: 200, average loss: 0.0827808466, accuracy: 0.9698, train_set: not normal
epoch: 139, learning rate: 0.005, batch size: 200, average loss: 0.0832820188, accuracy: 0.9697, train_set: normal
epoch: 186, learning rate: 0.005, batch size: 200, average loss: 0.0891305132, accuracy: 0.9695, train_set: normal
epoch: 169, learning rate: 0.005, batch size: 200, average loss: 0.0867678824, accuracy: 0.9693, train_set: normal
epoch: 127, learning rate: 0.005, batch size: 200, average loss: 0.0840070248, accuracy: 0.9690, train_set: normal
epoch: 172, learning rate: 0.005, batch size: 200, average loss: 0.0844018143, accuracy: 0.9689, train_set: not normal
epoch: 172, learning rate: 0.005, batch size: 200, average loss: 0.0894527279, accuracy: 0.9686, train_set: normal
epoch: 191, learning rate: 0.005, batch size: 200, average loss: 0.0924629187, accuracy: 0.9686, train_set: normal
epoch: 138, learning rate: 0.005, batch size: 200, average loss: 0.0860579868, accuracy: 0.9684, train_set: normal
epoch: 155, learning rate: 0.005, batch size: 200, average loss: 0.0889876820, accuracy: 0.9684, train_set: normal
epoch: 184, learning rate: 0.005, batch size: 200, average loss: 0.0916729918, accuracy: 0.9683, train_set: normal
epoch: 195, learning rate: 0.005, batch size: 200, average loss: 0.0919644795, accuracy: 0.9683, train_set: normal
epoch: 179, learning rate: 0.005, batch size: 200, average loss: 0.0912870209, accuracy: 0.9681, train_set: normal
epoch: 167, learning rate: 0.005, batch size: 200, average loss: 0.0868042411, accuracy: 0.9681, train_set: normal
epoch: 192, learning rate: 0.005, batch size: 200, average loss: 0.0872696730, accuracy: 0.9680, train_set: not normal
epoch: 145, learning rate: 0.005, batch size: 200, average loss: 0.0849740525, accuracy: 0.9680, train_set: not normal
epoch: 154, learning rate: 0.005, batch size: 200, average loss: 0.0902465764, accuracy: 0.9678, train_set: normal
epoch: 174, learning rate: 0.005, batch size: 200, average loss: 0.0930305983, accuracy: 0.9678, train_set: normal
epoch: 175, learning rate: 0.005, batch size: 200, average loss: 0.0949433674, accuracy: 0.9677, train_set: normal
epoch: 180, learning rate: 0.005, batch size: 200, average loss: 0.0964070747, accuracy: 0.9677, train_set: normal
epoch: 145, learning rate: 0.005, batch size: 200, average loss: 0.0905236108, accuracy: 0.9676, train_set: normal
epoch: 157, learning rate: 0.005, batch size: 200, average loss: 0.0905602483, accuracy: 0.9676, train_set: normal
epoch: 184, learning rate: 0.005, batch size: 200, average loss: 0.0935066586, accuracy: 0.9675, train_set: not normal
epoch: 189, learning rate: 0.005, batch size: 200, average loss: 0.0949949484, accuracy: 0.9675, train_set: normal
4.6.PyTorch DataLoader Source Code - Debugging Session

还是debug,讲的是底层如何标准化数据集,直接看视频吧:37-PyTorch DataLoader Source Code - Debugging Session_哔哩哔哩_bilibili

4.7.PyTorch Sequential Models - Neural Networks Made Easy


我感觉没有必要深究原理之类的东西,没有必要知道怎么实现的,会用就行,难度不大,基本看看代码就能明白,都跟大白话一样。有兴趣深究的话可以看视频:38-PyTorch Sequential Models - Neural Networks Made Easy_哔哩哔哩_bilibili。这里直接贴代码:

python 复制代码
import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

from collections import OrderedDict


train_set = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

image, label = train_set[0]
image = image.unsqueeze(0)

# method 1
class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size=2, stride=2)
        t = t.reshape(-1, 12 * 4 * 4)
        t = F.relu(self.fc1(t))
        t = F.relu(self.fc2(t))
        t = self.out(t)
        return t

network = Network()

# method 2
sequential1 = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Linear(in_features=12 * 4 * 4, out_features=120),
    nn.Linear(in_features=120, out_features=60),
    nn.Linear(in_features=60, out_features=10)

# method 3
layers = OrderedDict([
    ('conv1', nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)),
    ('relu1', nn.ReLU()),
    ('maxpool1', nn.MaxPool2d(kernel_size=2, stride=2)),
    ('conv2', nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)),
    ('relu2', nn.ReLU()),
    ('maxpool2', nn.MaxPool2d(kernel_size=2, stride=2)),
    ('flatten', nn.Flatten()),
    ('fc1', nn.Linear(in_features=12 * 4 * 4, out_features=120)),
    ('relu3', nn.ReLU()),
    ('fc2', nn.Linear(in_features=120, out_features=60)),
    ('relu4', nn.ReLU()),
    ('out', nn.Linear(in_features=60, out_features=10))
sequential2 = nn.Sequential(layers)

# method 4
sequential3 = nn.Sequential()
sequential3.add_module('conv1', nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5))
sequential3.add_module('relu1', nn.ReLU())
sequential3.add_module('maxpool1', nn.MaxPool2d(kernel_size=2, stride=2))
sequential3.add_module('conv2', nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5))
sequential3.add_module('relu2', nn.ReLU())
sequential3.add_module('maxpool2', nn.MaxPool2d(kernel_size=2, stride=2))
sequential3.add_module('flatten', nn.Flatten())
sequential3.add_module('fc1', nn.Linear(in_features=12 * 4 * 4, out_features=120))
sequential3.add_module('relu3', nn.ReLU())
sequential3.add_module('fc2', nn.Linear(in_features=120, out_features=60))
sequential3.add_module('relu4', nn.ReLU())
sequential3.add_module('out', nn.Linear(in_features=60, out_features=10))

# test



  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)

  (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (4): ReLU()
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Flatten(start_dim=1, end_dim=-1)
  (7): Linear(in_features=192, out_features=120, bias=True)
  (8): ReLU()
  (9): Linear(in_features=120, out_features=60, bias=True)
  (10): ReLU()
  (11): Linear(in_features=60, out_features=10, bias=True)

  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (relu1): ReLU()
  (maxpool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (relu2): ReLU()
  (maxpool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (relu3): ReLU()
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (relu4): ReLU()
  (out): Linear(in_features=60, out_features=10, bias=True)

  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (relu1): ReLU()
  (maxpool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (relu2): ReLU()
  (maxpool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (relu3): ReLU()
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (relu4): ReLU()
  (out): Linear(in_features=60, out_features=10, bias=True)

tensor([[-0.0957,  0.1053, -0.1055,  0.1547, -0.0366, -0.0132,  0.0749, -0.1152,  0.0426,  0.0639]], grad_fn=<AddmmBackward0>)
tensor([[-0.0957,  0.1053, -0.1055,  0.1547, -0.0366, -0.0132,  0.0749, -0.1152,  0.0426,  0.0639]], grad_fn=<AddmmBackward0>)
tensor([[-0.0957,  0.1053, -0.1055,  0.1547, -0.0366, -0.0132,  0.0749, -0.1152,  0.0426,  0.0639]], grad_fn=<AddmmBackward0>)
tensor([[-0.0957,  0.1053, -0.1055,  0.1547, -0.0366, -0.0132,  0.0749, -0.1152,  0.0426,  0.0639]], grad_fn=<AddmmBackward0>)
4.8.Batch Norm In PyTorch - Add Normalization To Conv Net Layers

Batch Normalization,批归一化。与4.5小节类似,也提供了一种优化网络的可能性。

注意 :本节课中代码涉及到num_workers的设置,如果是在windows上运行需要删掉,或者把这一部分放到main函数中

原理不多讲,有兴趣可以自己看视频:39-Batch Norm in PyTorch - Add Normalization to Conv Net Layers_哔哩哔哩_bilibili,前4分钟。这里直接贴两个网络:

python 复制代码
network1 = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Linear(in_features=12 * 4 * 4, out_features=120),
    nn.Linear(in_features=120, out_features=60),
    nn.Linear(in_features=60, out_features=10)

network2 = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Linear(in_features=12 * 4 * 4, out_features=120),
    nn.Linear(in_features=120, out_features=60),
    nn.Linear(in_features=60, out_features=10)


python 复制代码
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim  # short for optimizer. This can give us access to the optimizer we will use to update weights.

import torchvision
import torchvision.transforms as transforms

from collections import OrderedDict
from collections import namedtuple
from itertools import product

def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()

class RunBuilder:
    def get_runs(params):
        Run = namedtuple('Run', params.keys())
        runs = []
        for v in product(*params.values()):
        return runs


train_set_not_normal = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([

loader = torch.utils.data.DataLoader(train_set_not_normal, batch_size=len(train_set_not_normal), num_workers=4)
data = next(iter(loader))

train_set_normal = torchvision.datasets.FashionMNIST(
    , train=True
    , download=True
    , transform=transforms.Compose([
        # normalize
        transforms.Normalize((data[0].mean()), (data[0].std()))

network1 = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Linear(in_features=12 * 4 * 4, out_features=120),
    nn.Linear(in_features=120, out_features=60),
    nn.Linear(in_features=60, out_features=10)

network2 = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Linear(in_features=12 * 4 * 4, out_features=120),
    nn.Linear(in_features=120, out_features=60),
    nn.Linear(in_features=60, out_features=10)

params = OrderedDict(
    network=[network1, network2],
    train_set=[train_set_not_normal, train_set_normal],
    lr=[.01, .005],
    batch_size=[100, 200],

def main():
    results = []

    for run in RunBuilder.get_runs(params):
        comment = f'-{run}'

        network = run.network

        train_loader = torch.utils.data.DataLoader(dataset=run.train_set,
                                                   num_workers=run.num_workers)  # step 1: Get batch from the training set.
        optimizer = optim.Adam(network.parameters(), lr=run.lr)

        device = run.device

        train_start_time = time.time()
        for epoch in range(run.epoch):  # step 7: Repeat steps 1-6 for as many epochs required to obtain the desired level of accuracy.
            total_loss = 0
            total_correct = 0

            for batch in train_loader:  # step 6: Repeat steps 1-5 until one epoch is completed.
                images, labels = batch
                images, labels = images.to(device), labels.to(device)

                preds = network(images)  # step 2: Pass batch to network.
                loss = F.cross_entropy(preds, labels)  # step 3: Calculate the loss(difference between the predicted values and the true values).

                loss.backward()  # step 4: Calculate the gradient of the loss function w.r.t the network's weights.
                optimizer.step()  # step 5: Update the weights using the gradients to reduce the loss.

                total_loss += loss.item()
                total_correct += get_num_correct(preds, labels)

            print(f"epoch: {epoch:3d}, "
                  f"average loss: {total_loss / len(train_loader):.10f}, "
                  f"average correct: {total_correct / len(train_loader):.10f}")

                'epoch': epoch,
                'lr': run.lr,
                'batch_size': run.batch_size,
                'average_loss': (total_loss / len(train_loader)),
                'accuracy': (total_correct / len(train_loader) / run.batch_size),
                'train_set': 'not normal' if run.train_set == train_set_not_normal else 'normal',
                'batch_norm': 'not normal' if run.network == network1 else 'normal'

        train_end_time = time.time()
        print(f"train time: {train_end_time - train_start_time:.10f}")

    sorted_results = sorted(results, key=lambda x: x['accuracy'], reverse=True)

    print("\nSorted Results (by average correct):")
    for result in sorted_results:
        print(f"epoch: {result['epoch']:3d}, "
              f"learning rate: {result['lr']:.3f}, "
              f"batch size: {result['batch_size']}, "
              f"average loss: {result['average_loss']:.10f}, "
              f"accuracy: {result['accuracy']:.4f}, "
              f"train_set: {result['train_set']:10s}, "
              f"batch_norm: {result['batch_norm']:10s}")

if __name__ == '__main__':

注意 :这里设置了num_workers=4,如果是在windows上运行需要删掉这一部分,或者把这一部分放到main函数中!在代码的其他部分同样也要注意这个问题,不再反复提示!



python 复制代码
params = OrderedDict(
    train_set=[train_set_not_normal, train_set_normal],


Sorted Results (by average correct):
epoch: 169, learning rate: 0.005, batch size: 200, average loss: 0.0054161780, accuracy: 0.9982, train_set: normal    , batch_norm: normal    
epoch: 157, learning rate: 0.005, batch size: 200, average loss: 0.0066549324, accuracy: 0.9980, train_set: normal    , batch_norm: normal    
epoch: 138, learning rate: 0.005, batch size: 200, average loss: 0.0070235718, accuracy: 0.9978, train_set: normal    , batch_norm: normal    
epoch: 168, learning rate: 0.005, batch size: 200, average loss: 0.0075482397, accuracy: 0.9978, train_set: normal    , batch_norm: normal    
epoch:  23, learning rate: 0.005, batch size: 200, average loss: 0.0063228831, accuracy: 0.9978, train_set: normal    , batch_norm: normal    
epoch:  67, learning rate: 0.005, batch size: 200, average loss: 0.0068497834, accuracy: 0.9978, train_set: normal    , batch_norm: normal    
epoch: 108, learning rate: 0.005, batch size: 200, average loss: 0.0073561658, accuracy: 0.9977, train_set: normal    , batch_norm: normal    
epoch: 199, learning rate: 0.005, batch size: 200, average loss: 0.0067795254, accuracy: 0.9977, train_set: normal    , batch_norm: normal    
epoch: 176, learning rate: 0.005, batch size: 200, average loss: 0.0071034447, accuracy: 0.9977, train_set: normal    , batch_norm: normal    
epoch: 149, learning rate: 0.005, batch size: 200, average loss: 0.0074301912, accuracy: 0.9977, train_set: normal    , batch_norm: normal    
epoch:  89, learning rate: 0.005, batch size: 200, average loss: 0.0091262234, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch:  96, learning rate: 0.005, batch size: 200, average loss: 0.0070427983, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch: 119, learning rate: 0.005, batch size: 200, average loss: 0.0064677731, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch:  13, learning rate: 0.005, batch size: 200, average loss: 0.0070177157, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch:  99, learning rate: 0.005, batch size: 200, average loss: 0.0068566560, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch: 100, learning rate: 0.005, batch size: 200, average loss: 0.0073862491, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch: 139, learning rate: 0.005, batch size: 200, average loss: 0.0074154620, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch:  20, learning rate: 0.005, batch size: 200, average loss: 0.0069951047, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch:  60, learning rate: 0.005, batch size: 200, average loss: 0.0078619326, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch: 173, learning rate: 0.005, batch size: 200, average loss: 0.0094505679, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch:  83, learning rate: 0.005, batch size: 200, average loss: 0.0070482437, accuracy: 0.9976, train_set: normal    , batch_norm: normal    
epoch:  32, learning rate: 0.005, batch size: 200, average loss: 0.0077434058, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch: 133, learning rate: 0.005, batch size: 200, average loss: 0.0078485893, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch: 185, learning rate: 0.005, batch size: 200, average loss: 0.0086544660, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch: 187, learning rate: 0.005, batch size: 200, average loss: 0.0086393397, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch: 172, learning rate: 0.005, batch size: 200, average loss: 0.0084888153, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch: 107, learning rate: 0.005, batch size: 200, average loss: 0.0080345759, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch: 115, learning rate: 0.005, batch size: 200, average loss: 0.0076788745, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch: 148, learning rate: 0.005, batch size: 200, average loss: 0.0087985902, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch:  19, learning rate: 0.005, batch size: 200, average loss: 0.0072416995, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch: 116, learning rate: 0.005, batch size: 200, average loss: 0.0080688546, accuracy: 0.9975, train_set: normal    , batch_norm: normal    
epoch:  44, learning rate: 0.005, batch size: 200, average loss: 0.0083799361, accuracy: 0.9974, train_set: normal    , batch_norm: normal    
epoch:  84, learning rate: 0.005, batch size: 200, average loss: 0.0083246451, accuracy: 0.9974, train_set: normal    , batch_norm: normal    
epoch: 140, learning rate: 0.005, batch size: 200, average loss: 0.0093504604, accuracy: 0.9974, train_set: normal    , batch_norm: normal    
epoch: 124, learning rate: 0.005, batch size: 200, average loss: 0.0074316791, accuracy: 0.9974, train_set: normal    , batch_norm: normal    
epoch: 180, learning rate: 0.005, batch size: 200, average loss: 0.0076402023, accuracy: 0.9974, train_set: normal    , batch_norm: normal    
epoch:  75, learning rate: 0.005, batch size: 200, average loss: 0.0081966773, accuracy: 0.9974, train_set: normal    , batch_norm: normal    
epoch: 132, learning rate: 0.005, batch size: 200, average loss: 0.0080940460, accuracy: 0.9974, train_set: normal    , batch_norm: normal    
epoch: 130, learning rate: 0.005, batch size: 200, average loss: 0.0084586431, accuracy: 0.9974, train_set: normal    , batch_norm: normal    
epoch: 145, learning rate: 0.005, batch size: 200, average loss: 0.0084587986, accuracy: 0.9973, train_set: not normal, batch_norm: normal    
epoch:  64, learning rate: 0.005, batch size: 200, average loss: 0.0085187728, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch:  62, learning rate: 0.005, batch size: 200, average loss: 0.0084701613, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch:  90, learning rate: 0.005, batch size: 200, average loss: 0.0074276641, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch: 141, learning rate: 0.005, batch size: 200, average loss: 0.0090999772, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch: 150, learning rate: 0.005, batch size: 200, average loss: 0.0091410023, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch:  14, learning rate: 0.005, batch size: 200, average loss: 0.0078943848, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch:  26, learning rate: 0.005, batch size: 200, average loss: 0.0085391161, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch:  29, learning rate: 0.005, batch size: 200, average loss: 0.0077431038, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch: 147, learning rate: 0.005, batch size: 200, average loss: 0.0095553209, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch: 165, learning rate: 0.005, batch size: 200, average loss: 0.0088865872, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch: 196, learning rate: 0.005, batch size: 200, average loss: 0.0081543449, accuracy: 0.9973, train_set: not normal, batch_norm: normal    
epoch:  50, learning rate: 0.005, batch size: 200, average loss: 0.0092922270, accuracy: 0.9973, train_set: normal    , batch_norm: normal    
epoch: 195, learning rate: 0.005, batch size: 200, average loss: 0.1036762961, accuracy: 0.9636, train_set: not normal, batch_norm: not normal
epoch: 128, learning rate: 0.010, batch size: 100, average loss: 0.0994206288, accuracy: 0.9636, train_set: not normal, batch_norm: normal    
epoch: 135, learning rate: 0.010, batch size: 100, average loss: 0.0996087921, accuracy: 0.9635, train_set: not normal, batch_norm: normal    
epoch: 125, learning rate: 0.010, batch size: 100, average loss: 0.0991111694, accuracy: 0.9635, train_set: not normal, batch_norm: normal    
epoch: 176, learning rate: 0.005, batch size: 200, average loss: 0.1000632046, accuracy: 0.9634, train_set: not normal, batch_norm: not normal
epoch:  34, learning rate: 0.010, batch size: 100, average loss: 0.1001493678, accuracy: 0.9634, train_set: normal    , batch_norm: normal    
epoch: 169, learning rate: 0.005, batch size: 200, average loss: 0.1025721032, accuracy: 0.9632, train_set: not normal, batch_norm: not normal
epoch: 116, learning rate: 0.010, batch size: 100, average loss: 0.1014422027, accuracy: 0.9629, train_set: not normal, batch_norm: normal    
epoch: 122, learning rate: 0.010, batch size: 100, average loss: 0.0997589069, accuracy: 0.9629, train_set: not normal, batch_norm: normal    
epoch: 112, learning rate: 0.010, batch size: 100, average loss: 0.1017966709, accuracy: 0.9626, train_set: not normal, batch_norm: normal    
epoch: 126, learning rate: 0.010, batch size: 100, average loss: 0.1000356127, accuracy: 0.9626, train_set: not normal, batch_norm: normal    
epoch: 127, learning rate: 0.010, batch size: 100, average loss: 0.1012638301, accuracy: 0.9625, train_set: not normal, batch_norm: normal    
epoch: 193, learning rate: 0.005, batch size: 200, average loss: 0.1030388612, accuracy: 0.9624, train_set: not normal, batch_norm: not normal
epoch:  72, learning rate: 0.005, batch size: 200, average loss: 0.1044078310, accuracy: 0.9624, train_set: not normal, batch_norm: not normal
epoch: 121, learning rate: 0.010, batch size: 100, average loss: 0.1004619163, accuracy: 0.9624, train_set: not normal, batch_norm: normal    
epoch: 114, learning rate: 0.010, batch size: 100, average loss: 0.1037834815, accuracy: 0.9623, train_set: not normal, batch_norm: normal    
epoch:  44, learning rate: 0.005, batch size: 200, average loss: 0.1063651085, accuracy: 0.9622, train_set: not normal, batch_norm: not normal







T0uken2 分钟前
张琪杭4 分钟前
pytorch tensor创建tensor
山西茄子9 分钟前
轻口味13 分钟前
【每日学点HarmonyOS Next知识】状态栏控制、片段按钮点击回调、绘制组件、取消按钮与输入框对齐、父调子组件方法
星星点点洲14 分钟前
【RAG】RAG 系统的基本搭建流程(ES关键词检索示例)
带娃的IT创业者35 分钟前
《Python实战进阶》No18: 使用 Apache Spark 进行分布式计算
Tomorrow'sThinker1 小时前
元媛媛1 小时前
Python - 轻量级后端框架 Flask
疏狂难除1 小时前
Tianwen_Burning1 小时前