新手小白的pytorch学习第九弹-----提升分类问题模型性能和非线性激活函数

[1 改善模型性能](#1 改善模型性能)
[2 线性模型 fit 圆圈数据](#2 线性模型 fit 圆圈数据)
[3 线性模型 fit 线性方程](#3 线性模型 fit 线性方程)
[4 加入非线性激活函数 fit 圆圈数据](#4 加入非线性激活函数 fit 圆圈数据)
[5 复现非线性激活函数](#5 复现非线性激活函数)
- [5.1 ReLU](#5.1 ReLU)
- [5.2 sigmoid](#5.2 sigmoid)

OK, 今天我们学习如何改善模型的性能

1 改善模型性能

以下有几种供我们考虑的思路：

添加模型层数

添加神经元的个数

增加训练的周期

选择更好的损失函数

调整学习率

选择更优的优化器

更改激活函数

让我们创建一个模型看看吧，Let's have a try!

2 线性模型 fit 圆圈数据

python 复制代码

# 制作数据
from sklearn.datasets import make_circles

# 创建1000个样本
n_samples = 1000

# 创建我们的圆圈样本
X, y = make_circles(n_samples,
                    noise=0.03, # 每个点的噪声
                    random_state=42) # 保证我们获得相同的值

可视化来看一下数据

python 复制代码

import matplotlib.pyplot as plt
plt.scatter(x=X[:,0],
            y=X[:,1],
            c=y,
            cmap=plt.cm.RdYlBu)

python 复制代码

# 将数据转换为张量,并将数据转换为默认数据格式
import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

# 查看一下前五个样本
X[:5],y[:5]

(tensor([[ 0.7542, 0.2315],
$-0.7562, 0.1533\], \[-0.8154, 0.1733\], \[-0.3937, 0.6929\], \[ 0.4422, -0.8967\]\]), tensor(\[1., 1., 1., 1., 0.\]))$

python 复制代码

# 划分数据为训练集和测试集
from sklearn.model_selection import train_test_split

# test_size=0.2 是说测试数据占数据的20%，因为这个方法是随机划分的，因此我们这里设置了random_state=42，这样就有助于我们复现代码
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y, 
                                                    test_size=0.2,
                                                    random_state=42)
len(X_train), len(y_train), len(X_test), len(y_test)

(800, 800, 200, 200)

python 复制代码

import torch.nn as nn
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
device

python 复制代码

import torch.nn as nn
class CircleClassificationV2(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer_1 = nn.Linear(in_features=2, out_features=10)
        self.layer_2 = nn.Linear(in_features=10, out_features=10)
        self.layer_3 = nn.Linear(in_features=10, out_features=1)
    
    def forward(self, x):
        return self.layer_3(self.layer_2(self.layer_1(x)))
    
model_2 = CircleClassificationV2().to(device)
model_2

CircleClassificationV2(

(layer_1): Linear(in_features=2, out_features=10, bias=True)

(layer_2): Linear(in_features=10, out_features=10, bias=True)

(layer_3): Linear(in_features=10, out_features=1, bias=True)

)

可以看出这里有三层线形层，同时out_features的数量也增加了

python 复制代码

# 损失函数
loss_fn = nn.BCEWithLogitsLoss()

# 优化器
optimizer = optim.SGD(params=model_2.parameters(),
                      lr=0.1)

python 复制代码

# 设置训练周期
epochs = 1000

# 将数据都放到统一的设备上
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

# 训练循环
for epoch in range(epochs):
    model_2.train()
    y_logits = model_2(X_train).squeeze()
    y_pred = torch.round(torch.sigmoid(y_logits))
    
    # 损失函数
    loss = loss_fn(y_logits,
                   y_train)
    acc = accuracy_fn(y_true=y_train,
                      y_pred=y_pred)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # 测试
    model_0.eval()
    with torch.inference_mode():
        test_logits = model_2(X_test).squeeze()
        test_pred = torch.round(torch.sigmoid(test_logits))
        test_loss = loss_fn(test_logits, 
                            y_test)
        test_acc = accuracy_fn(y_true=y_test,
                               y_pred=test_pred)
    # 打印输出
    if epoch % 100 == 0:
        print(f"Epoch:{epoch} | Train loss:{loss:.5f} | Train accuracy:{acc:.2f}% | Test loss:{test_loss:.4f} | Test accuracy:{test_acc:.2f}%")

Epoch:0 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:100 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:200 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:300 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:400 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:500 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:600 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:700 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:800 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

Epoch:900 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%

分类的损失并没有变小，准确率仍然是50%，这就意味着模型进行分类，就是随机的，就像抛硬币一样，50%正面，50%反面。让我们可视化看一下。

python 复制代码

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Training")
plot_decision_boundary(model_2, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Testing")
plot_decision_boundary(model_2, X_test, y_test)

从这个图片，我们可以看出，这个图像的分解仍然是一条线，在右上角。那这是我们的模型没有进行学习吗？还记得我们之前学习的线性回归吗？ y = weight * X + bias, 我们用这个来试一试就知道这个模型有没有学习数据了。

3 线性模型 fit 线性方程

python 复制代码

# 创建数据
weight = 0.7
bias = 0.3

start = 0
end = 1
step = 0.01

X = torch.arange(start, end, step).unsqueeze(dim=1)
y = weight * X + bias

print(len(X), len(y))
print(X[:5], y[:5])

100 100

tensor([[0.0000],
$0.0100\], \[0.0200\], \[0.0300\], \[0.0400\]\]) tensor(\[\[0.3000\], \[0.3070\], \[0.3140\], \[0.3210\], \[0.3280\]\])$

python 复制代码

# 将数据划分为训练集和测试集
train_split = int(0.8 * len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]

len(X_train), len(y_train), len(X_test), len(y_test)

(80, 80, 20, 20)

python 复制代码

plot_predictions(train_data=X_train,
                 train_labels=y_train,
                 test_data=X_test,
                 test_labels=y_test)

python 复制代码

# 创建设备无关的代码
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

python 复制代码

# 设置CPU上的随机种子
torch.manual_seed(42)

# 设置GPU上的随机种子
torch.cuda.manual_seed(42)

# 将数据放到GPU上
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

python 复制代码

model_2

CircleClassificationV2(

(layer_1): Linear(in_features=2, out_features=10, bias=True)

(layer_2): Linear(in_features=10, out_features=10, bias=True)

(layer_3): Linear(in_features=10, out_features=1, bias=True)

)

从这里可以看出模型的输入是2，但我们这里线性回归的输入是1，所以这里注意需要更改

python 复制代码

# 创建模型，这里采用 nn.Sequential 来构建模型，因为是顺序的，这样简单一点，和model_2一样的结构
model_1 = nn.Sequential(
    nn.Linear(in_features = 1, out_features = 10),
    nn.Linear(in_features = 10, out_features = 10),
    nn.Linear(in_features = 10, out_features = 1)
)

model_1.to(device)
model_1

Sequential(

(0): Linear(in_features=1, out_features=10, bias=True)

(1): Linear(in_features=10, out_features=10, bias=True)

(2): Linear(in_features=10, out_features=1, bias=True)

)

python 复制代码

# 损失函数， 因为是线性的，所以我们肯定是用MAE的
loss_fn = nn.L1Loss()

# 优化器
optimizer = optim.SGD(params=model_1.parameters(),
                      lr=0.01)

python 复制代码

# 训练的周期
epochs = 1000

for epoch in range(epochs):
    # 训练
    model_1.train()
    
    # 前向传播
    y_pred = model_1(X_train)
    
    # 损失
    loss = loss_fn(y_pred, 
                   y_train)
    
    # 梯度清零
    optimizer.zero_grad()
    
    # 反向传播
    loss.backward()
    
    # 梯度下降
    optimizer.step()
    
    # 测试
    model_1.eval()
    with torch.inference_mode():
        test_pred = model_1(X_test)
        test_loss = loss_fn(test_pred, 
                            y_test)
    
    # 打印结果
    if epoch % 100 == 0:
        print(f"Epoch:{epoch} | Train loss:{loss:.4f} | Test loss:{test_loss:.4f}")

Epoch:0 | Train loss:0.7599 | Test loss:0.9110

Epoch:100 | Train loss:0.0286 | Test loss:0.0008

Epoch:200 | Train loss:0.0253 | Test loss:0.0021

Epoch:300 | Train loss:0.0214 | Test loss:0.0031

Epoch:400 | Train loss:0.0196 | Test loss:0.0034

Epoch:500 | Train loss:0.0194 | Test loss:0.0039

Epoch:600 | Train loss:0.0190 | Test loss:0.0038

Epoch:700 | Train loss:0.0188 | Test loss:0.0038

Epoch:800 | Train loss:0.0184 | Test loss:0.0033

Epoch:900 | Train loss:0.0180 | Test loss:0.0036

可视化看一下

python 复制代码

model_1.eval()
with torch.inference_mode():
    y_pred = model_1(X_test)

python 复制代码

plot_predictions(train_data=X_train.cpu(),
                 train_labels=y_train.cpu(),
                 test_data=X_test.cpu(),
                 test_labels=y_test.cpu(),
                 predictions=y_pred.cpu())

设置一下学习率，马上红色和绿色的点就更接近，那从这里我们可以看出定义的模型是对数据进行了学习的，那为什么效果不好呢，可以想一下，我们的数据是圆形和线性明显是不一样的，这里就不得不提出一个概念，非线性 。相信了解过机器学习的是不是都听过ReLu()、Sigmoid()、Tanh()这些非线性的函数，加入非线性我们才能更好地学习数据。

4 加入非线性激活函数 fit 圆圈数据

python 复制代码

# 创建数据
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split

X, y = make_circles(n_samples=1000,
                    noise=0.03,
                    random_state=42)

# 将 X, y转换为张量
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)
X[:5], y[:5]

(tensor([[ 0.7542, 0.2315],
$-0.7562, 0.1533\], \[-0.8154, 0.1733\], \[-0.3937, 0.6929\], \[ 0.4422, -0.8967\]\]), tensor(\[1., 1., 1., 1., 0.\]))$

python 复制代码

# 绘制个图像看看
import matplotlib.pyplot as plt
plt.scatter(x = X[:,0],
            y = X[:,1],
            c=y,
            cmap=plt.cm.RdYlBu)

python 复制代码

# 将数据集划分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.2, 
                                                    random_state=42)
len(X_train), len(y_train), len(X_test), len(y_test)

(800, 800, 200, 200)

python 复制代码

# 设备无关的代码
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

python 复制代码

# 将数据放到统一的设备上
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

创建模型，并将模型实例化，把他放到指定的设备上

python 复制代码

class CircleClassificationV3(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.layer1 = nn.Linear(in_features=2, out_features=10)
        self.layer2 = nn.Linear(in_features=10, out_features=10)
        self.layer3 = nn.Linear(in_features=10, out_features=1)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        # 这里用了两个ReLU()啊
        return self.layer3(self.relu(self.layer2(self.relu(self.layer1(x)))))
    
model_3 = CircleClassificationV3().to(device)
model_3

CircleClassificationV3(

(layer1): Linear(in_features=2, out_features=10, bias=True)

(layer2): Linear(in_features=10, out_features=10, bias=True)

(layer3): Linear(in_features=10, out_features=1, bias=True)

(relu): ReLU()

)

python 复制代码

# 损失函数
loss_fn = nn.BCEWithLogitsLoss()

# 优化函数
optimizer = optim.SGD(params=model_3.parameters(),
                      lr=0.1)

python 复制代码

print((model_3(X_train).squeeze()).shape)
print(y_train.shape)

torch.Size([800])

torch.Size([800])

python 复制代码

# 设置随机种子
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# 设置训练周期
epochs = 1000

for epoch in range(epochs):
    # 训练阶段
    model_3.train()
    y_logits = model_3(X_train).squeeze()
    y_pred = torch.round(torch.sigmoid(y_logits))
    
    loss = loss_fn(y_logits, 
                   y_train)
    acc = accuracy_fn(y_true=y_train,
                      y_pred=y_pred)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # 测试阶段
    model_3.eval()
    with torch.inference_mode():
        test_logits = model_3(X_test).squeeze()
        test_pred = torch.round(torch.sigmoid(test_logits))
        test_loss = loss_fn(test_logits,
                            y_test)
        test_acc = accuracy_fn(y_true=y_test,
                               y_pred=test_pred)
        
    # 打印输出
    if epoch % 100 == 0:
        print(f"Epoch:{epoch} | Train Loss:{loss:.4f} | Train Accuracy:{acc:.2f}% | Test Loss:{test_loss:.4f} | Test Accuracy:{test_acc:.2f}%")

Epoch:0 | Train Loss:0.6929 | Train Accuracy:50.00% | Test Loss:0.6932 | Test Accuracy:50.00%

Epoch:100 | Train Loss:0.6912 | Train Accuracy:52.88% | Test Loss:0.6910 | Test Accuracy:52.50%

Epoch:200 | Train Loss:0.6898 | Train Accuracy:53.37% | Test Loss:0.6894 | Test Accuracy:55.00%

Epoch:300 | Train Loss:0.6879 | Train Accuracy:53.00% | Test Loss:0.6872 | Test Accuracy:56.00%

Epoch:400 | Train Loss:0.6852 | Train Accuracy:52.75% | Test Loss:0.6841 | Test Accuracy:56.50%

Epoch:500 | Train Loss:0.6810 | Train Accuracy:52.75% | Test Loss:0.6794 | Test Accuracy:56.50%

Epoch:600 | Train Loss:0.6751 | Train Accuracy:54.50% | Test Loss:0.6729 | Test Accuracy:56.00%

Epoch:700 | Train Loss:0.6666 | Train Accuracy:58.38% | Test Loss:0.6632 | Test Accuracy:59.00%

Epoch:800 | Train Loss:0.6516 | Train Accuracy:64.00% | Test Loss:0.6476 | Test Accuracy:67.50%

Epoch:900 | Train Loss:0.6236 | Train Accuracy:74.00% | Test Loss:0.6215 | Test Accuracy:79.00%

绘制图像浅看一下

python 复制代码

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_3, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_3, X_test, y_test)

哇塞，这个图尊嘟很不错，是不是成功的画圈圈啦，证明什么，证明我们的激活函数是有作用的，我们可以参考上面的思路调整模型的性能哈，接下来，我们就会把这些过程完整的整合在一起啦！

5 复现非线性激活函数

我们之前实验了如何将激活函数加入我们的模型中来给非线性激活函数建模.

python 复制代码

# 创建一个简单的 tensor
A = torch.arange(-10, 10, 1, dtype=torch.float)
A

tensor([-10., -9., -8., -7., -6., -5., -4., -3., -2., -1., 0., 1.,

2., 3., 4., 5., 6., 7., 8., 9.])

python 复制代码

plt.plot(A)

接下来让我看看ReLU是如何影响它的

5.1 ReLU

python 复制代码

def relu(x):
    return torch.maximum(torch.tensor(0),x)

relu(A)

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 2., 3., 4., 5., 6., 7.,

8., 9.])

所有的负数都变成0了

python 复制代码

plt.plot(relu(A))

5.2 sigmoid

python 复制代码

def sigmoid(x):
    return 1 / (1 + torch.exp(-x))

sigmoid(A)

tensor([4.5398e-05, 1.2339e-04, 3.3535e-04, 9.1105e-04, 2.4726e-03, 6.6929e-03,

1.7986e-02, 4.7426e-02, 1.1920e-01, 2.6894e-01, 5.0000e-01, 7.3106e-01,

8.8080e-01, 9.5257e-01, 9.8201e-01, 9.9331e-01, 9.9753e-01, 9.9909e-01,

9.9966e-01, 9.9988e-01])

python 复制代码

plt.plot(sigmoid(A))

ok，今天很顺利的完成了我的学习task！
BB啊**，今天好好吃饭了嘛~晚餐的牛蛙很好吃，方便面也嘎嘎nice！吃撑啦。

BB ，如果我的文档对您有帮助的话，记得给俺点个赞呐！比心心

靴靴，谢谢~

新手小白的pytorch学习第九弹-----提升分类问题模型性能和非线性激活函数

目录

1 改善模型性能

2 线性模型 fit 圆圈数据

3 线性模型 fit 线性方程

4 加入非线性激活函数 fit 圆圈数据

5 复现非线性激活函数

5.1 ReLU

5.2 sigmoid