SGD优化器基本原理讲解
随机梯度下降(SGD)是一种迭代方法,其背后基本思想最早可以追溯到1950年代的Robbins-Monro算法,用于优化可微分目标函数。
它可以被视为梯度下降优化的随机近似,因为它用实际梯度(从整个数据集计算)替换为实际梯度(从随机选择的数据子集计算)。特别是在高维优化问题中,这减少了非常高的计算负担,实现更快的迭代以换取更低的收敛速度。随机梯度下降已成为机器学习中重要的优化方法。
如果你对MindSpore感兴趣,可以关注昇思MindSpore社区
根据算法原理,用代码简单搭建了一个随机梯度下降算法的优化器:
import numpy as np
def SGD(data_x, data_y, alpha=0.1, maxepochs=10000,epsilon=1e-4):
xMat = np.mat(data_x)
yMat = np.mat(data_y)
m, n = xMat.shape
weights = np.ones((n, 1)) # 模型参数
epochs_count = 0
loss_list = []
epochs_list = []
while epochs_count < maxepochs:
rand_i = np.random.randint(m) # 随机取一个样本
loss = cost(xMat,weights,yMat) #前一次迭代的损失值
hypothesis = sigmoid(np.dot(xMat[rand_i,:],weights)) #预测值
error = hypothesis -yMat[rand_i,:] #预测值与实际值误差
grad = np.dot(xMat[rand_i,:].T,error) #损失函数的梯度
weights = weights - alpha*grad #参数更新
loss_new = cost(xMat,weights,yMat)#当前迭代的损失值
print(loss_new)
if abs(loss_new-loss)<epsilon:
break
loss_list.append(loss_new)
epochs_list.append(epochs_count)
epochs_count += 1
print('迭代到第{}次,结束迭代!'.format(epochs_count))
plt.plot(epochs_list,loss_list)
plt.xlabel('epochs')
plt.ylabel('loss')
plt.show()
return weights
基于MindSpore框架源码的SGD优化方法讲解
def decay_weight(self, gradients):
if self.exec_weight_decay:
params = self._parameters
weight_decay = self.get_weight_decay()
if self.is_group:
gradients = self.map_(F.partial(_apply_decay), weight_decay, self.decay_flags, params, gradients)
else:
gradients = self.map_(F.partial(_apply_decay, weight_decay), self.decay_flags, params, gradients)
return gradients
def get_weight_decay(self):
if self.dynamic_weight_decay:
if self.is_group:
weight_decay = ()
for weight_decay_, flag_ in zip(self.weight_decay, self.dynamic_decay_flags):
current_weight_decay = weight_decay_(self.global_step) if flag_ else weight_decay_
weight_decay += (current_weight_decay,)
return weight_decay
return self.weight_decay(self.global_step)
return self.weight_decay
SGD应用案例
以上介绍了SGD以及它的很多优化方法,但如何选择优化方法和对应的参数依然是一个问题。
本部分我们希望通过一个案例设计一组对比实验,演示如何在实践中使用SGD优化器。通过这个案例,我们希望能达成三个目的:一是能够快速入门MindSpore,并将其应用于实践;二是掌握MindSpore框架SGD优化器的使用方法;三是比较不同参数设置下SGD优化器算法的性能,探究SGD优化器参数设置问题。
应用案例介绍
案例流程
本案例使用数据集Fashion MNIST训练lenet网络模型,中间调用MindSpore提供的SGD优化器API,并设置三组实验来对比SGD优化器不同参数的选择对模型训练带来的影响。三组实验学习率固定设为0.01。实验一为SGD不带任何优化参数,实验二为SGD+momentum,实验三为SGD+momentum+nesterov。momentum与nesterov在理论上都能加速网络训练。最后通过实验数据可视化分析,得出结论。
关于数据集
Fashion MNIST(服饰数据集)是经典MNIST数据集的简易替换,MNIST数据集包含手写数字(阿拉伯数字)的图像,两者图像格式及大小都相同。Fashion MNIST比常规 MNIST手写数据将更具挑战性。两者数据集都较小,主要适用于初学者学习或验证某个算法可否正常运行。他们是测试和调试代码的良好起点。
Fashion MNIST/服饰数据集包含70000张灰度图像,其中包含60,000个示例的训练集和10,000个示例的测试集,每个示例都是一个28x28灰度图像。
数据集文件路径如下:
./mnist/
├── test
│ ├── t10k-images-idx3-ubyte
│ └── t10k-labels-idx1-ubyte
└── train
├── train-images-idx3-ubyte
└── train-labels-idx1-ubyte
实验步骤
首先配置环境,要求MindSpore>=1.8.1,还需要安装mindvision和解决兼容性问题。如果使用Jupyter Notebook做本实验,完成后需要重启内核。
[ ]
!pip install mindvision
!pip uninstall opencv-python-headless-4.6.0.66
!pip install "opencv-python-headless<4.3"
导入需要的模块
[8]
from mindspore import ops
from mindspore import nn
import csv
由于mindvision支持几种经典数据集,其中就有FashionMnist。我们直接使用mindvision接口下载FashionMnist数据集,并且无需进行数据预处理。
[ ]
from mindvision.classification.dataset import FashionMnist
download_train = FashionMnist(path="./FashionMnist", split="train", batch_size=32, repeat_num=1, shuffle=True,
resize=32, download=True)
download_test = FashionMnist(path="./FashionMnist", split="test", batch_size=32, resize=32, download=True)
train_dataset = download_train.run()
test_dataset = download_test.run()
检查数据集结构
[10]
for image, label in train_dataset.create_tuple_iterator():
print(f"Shape of image [N, C, H, W]: {image.shape} {image.dtype}")
print(f"Shape of label: {label.shape} {label.dtype}")
break
Shape of image [N, C, H, W]: (32, 1, 32, 32) Float32
Shape of label: (32,) Int32
使用LeNet网络,因为这是一个优化器算法比较案例,对网络没有特殊要求,这里推荐直接使用mindvision提供的接口。记得数据集为灰度图,通道数需要设为1。
[11]
from mindvision.classification.models import lenet
network = lenet(num_classes=10, num_channel=1, include_top=True)
创建训练函数和测试函数。对于测试函数,采用两个指标来评估模型质量:一是在测试集上的预测精度,二是在测试集上的平均损失。将这两个数据保存在csv文件中,方便后续处理。
[12]
from mindspore import ops
from mindspore import nn
# 定义训练函数
def train(model, dataset, loss_fn, optimizer):
# Define forward function
def forward_fn(data, label):
logits = model(data)
loss = loss_fn(logits, label)
return loss, logits
# Get gradient function
grad_fn = ops.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)
# Define function of one-step training
def train_step(data, label):
(loss, _), grads = grad_fn(data, label)
loss = ops.depend(loss, optimizer(grads))
return loss
size = dataset.get_dataset_size()
model.set_train()
for batch, (data, label) in enumerate(dataset.create_tuple_iterator()):
loss = train_step(data, label)
if batch % 100 == 0:
loss, current = loss.asnumpy(), batch
print(f"loss: {loss:>7f} [{current:>3d}/{size:>3d}]")
# 除训练外,我们定义测试函数,用来评估模型的性能。
def test(model, dataset, loss_fn, writer):
num_batches = dataset.get_dataset_size()
model.set_train(False)
total, test_loss, correct = 0, 0, 0
for data, label in dataset.create_tuple_iterator():
pred = model(data)
total += len(data)
test_loss += loss_fn(pred, label).asnumpy()
correct += (pred.argmax(1) == label).asnumpy().sum()
test_loss /= num_batches
correct /= total
print(f"Test: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
correct = round(correct * 100, 1)
test_loss = round(test_loss,6)
writer.writerow([correct, test_loss]) #将数据保存在对应的csv文件中
定义损失函数和优化器函数。损失函数使用交叉熵损失函数,优化器选择SGD,迭代次数为10,学习率固定设为0.01。在每次训练过后都将模型用测试集评估性能,来体现训练过程。
下面是实验一,使用纯粹的SGD优化器。
[13]
import csv
from mindvision.classification.models import lenet
from mindspore import nn
# 构建网络模型
network1 = lenet(num_classes=10, num_channel=1, include_top=True)
# 定义损失函数
net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
# 定义优化器函数
net_opt = nn.SGD(network1.trainable_params(), learning_rate=1e-2)
# 设置迭代次数
epochs = 10
csv_file1 = open('result/sgd1.csv', 'w', newline='')
writer1 = csv.writer(csv_file1)
writer1.writerow(['Accuracy', 'Avg_loss'])
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(network1, train_dataset, net_loss, net_opt)
test(network1, test_dataset, net_loss, writer1)
csv_file1.close()
print("Done!")
Epoch 1
-------------------------------
loss: 2.302577 [ 0/1875]
loss: 2.302847 [100/1875]
loss: 2.302751 [200/1875]
loss: 2.302980 [300/1875]
loss: 2.303654 [400/1875]
loss: 2.303366 [500/1875]
loss: 2.304052 [600/1875]
loss: 2.302124 [700/1875]
loss: 2.300979 [800/1875]
loss: 2.305120 [900/1875]
loss: 2.302595 [1000/1875]
loss: 2.302863 [1100/1875]
loss: 2.302272 [1200/1875]
loss: 2.302855 [1300/1875]
loss: 2.301332 [1400/1875]
loss: 2.302946 [1500/1875]
loss: 2.302062 [1600/1875]
loss: 2.301753 [1700/1875]
loss: 2.302556 [1800/1875]
Test:
Accuracy: 10.0%, Avg loss: 2.302537
Epoch 2
-------------------------------
loss: 2.302315 [ 0/1875]
loss: 2.302679 [100/1875]
loss: 2.303077 [200/1875]
loss: 2.302937 [300/1875]
loss: 2.303362 [400/1875]
loss: 2.303807 [500/1875]
loss: 2.301589 [600/1875]
loss: 2.301368 [700/1875]
loss: 2.299467 [800/1875]
loss: 2.304304 [900/1875]
loss: 2.303492 [1000/1875]
loss: 2.303529 [1100/1875]
loss: 2.304138 [1200/1875]
loss: 2.301667 [1300/1875]
loss: 2.301730 [1400/1875]
loss: 2.303048 [1500/1875]
loss: 2.303775 [1600/1875]
loss: 2.303029 [1700/1875]
loss: 2.302475 [1800/1875]
Test:
Accuracy: 16.2%, Avg loss: 2.302400
Epoch 3
-------------------------------
loss: 2.301794 [ 0/1875]
loss: 2.303500 [100/1875]
loss: 2.302805 [200/1875]
loss: 2.302227 [300/1875]
loss: 2.301216 [400/1875]
loss: 2.302775 [500/1875]
loss: 2.301936 [600/1875]
loss: 2.302594 [700/1875]
loss: 2.302720 [800/1875]
loss: 2.302242 [900/1875]
loss: 2.303227 [1000/1875]
loss: 2.301566 [1100/1875]
loss: 2.301122 [1200/1875]
loss: 2.301184 [1300/1875]
loss: 2.299739 [1400/1875]
loss: 2.302099 [1500/1875]
loss: 2.301378 [1600/1875]
loss: 2.299140 [1700/1875]
loss: 2.298317 [1800/1875]
Test:
Accuracy: 19.9%, Avg loss: 2.299189
Epoch 4
-------------------------------
loss: 2.298693 [ 0/1875]
loss: 2.298517 [100/1875]
loss: 2.295070 [200/1875]
loss: 2.287685 [300/1875]
loss: 2.273570 [400/1875]
loss: 2.171952 [500/1875]
loss: 1.555109 [600/1875]
loss: 1.315035 [700/1875]
loss: 1.290831 [800/1875]
loss: 0.957171 [900/1875]
loss: 0.683247 [1000/1875]
loss: 1.657022 [1100/1875]
loss: 0.885075 [1200/1875]
loss: 0.906517 [1300/1875]
loss: 0.904378 [1400/1875]
loss: 1.017345 [1500/1875]
loss: 1.311617 [1600/1875]
loss: 0.797807 [1700/1875]
loss: 0.899135 [1800/1875]
Test:
Accuracy: 69.3%, Avg loss: 0.824970
Epoch 5
-------------------------------
loss: 0.632757 [ 0/1875]
loss: 0.734659 [100/1875]
loss: 0.945924 [200/1875]
loss: 0.672319 [300/1875]
loss: 0.684902 [400/1875]
loss: 0.706946 [500/1875]
loss: 0.759072 [600/1875]
loss: 0.552457 [700/1875]
loss: 0.529598 [800/1875]
loss: 1.010510 [900/1875]
loss: 0.701660 [1000/1875]
loss: 0.749967 [1100/1875]
loss: 0.557507 [1200/1875]
loss: 0.663519 [1300/1875]
loss: 0.856946 [1400/1875]
loss: 0.672521 [1500/1875]
loss: 0.620852 [1600/1875]
loss: 0.940065 [1700/1875]
loss: 0.525178 [1800/1875]
Test:
Accuracy: 74.8%, Avg loss: 0.673452
Epoch 6
-------------------------------
loss: 0.526541 [ 0/1875]
loss: 0.551981 [100/1875]
loss: 0.437927 [200/1875]
loss: 0.536838 [300/1875]
loss: 0.612873 [400/1875]
loss: 0.663827 [500/1875]
loss: 0.462745 [600/1875]
loss: 0.553594 [700/1875]
loss: 0.411590 [800/1875]
loss: 0.748116 [900/1875]
loss: 0.510949 [1000/1875]
loss: 0.381978 [1100/1875]
loss: 0.571769 [1200/1875]
loss: 0.576415 [1300/1875]
loss: 0.674996 [1400/1875]
loss: 0.657864 [1500/1875]
loss: 0.638442 [1600/1875]
loss: 0.409129 [1700/1875]
loss: 0.637906 [1800/1875]
Test:
Accuracy: 78.8%, Avg loss: 0.573736
Epoch 7
-------------------------------
loss: 0.303391 [ 0/1875]
loss: 0.386003 [100/1875]
loss: 0.689905 [200/1875]
loss: 0.512580 [300/1875]
loss: 0.466622 [400/1875]
loss: 0.435113 [500/1875]
loss: 0.432267 [600/1875]
loss: 0.504910 [700/1875]
loss: 0.666079 [800/1875]
loss: 0.614079 [900/1875]
loss: 0.400944 [1000/1875]
loss: 0.448082 [1100/1875]
loss: 0.767068 [1200/1875]
loss: 0.498046 [1300/1875]
loss: 0.530967 [1400/1875]
loss: 0.754128 [1500/1875]
loss: 0.558144 [1600/1875]
loss: 0.258042 [1700/1875]
loss: 0.358890 [1800/1875]
Test:
Accuracy: 81.6%, Avg loss: 0.510896
Epoch 8
-------------------------------
loss: 0.506675 [ 0/1875]
loss: 0.753561 [100/1875]
loss: 0.468868 [200/1875]
loss: 0.422411 [300/1875]
loss: 0.396021 [400/1875]
loss: 0.393111 [500/1875]
loss: 0.963203 [600/1875]
loss: 0.572667 [700/1875]
loss: 0.288582 [800/1875]
loss: 0.419174 [900/1875]
loss: 0.356242 [1000/1875]
loss: 0.464459 [1100/1875]
loss: 0.501630 [1200/1875]
loss: 0.593385 [1300/1875]
loss: 0.398698 [1400/1875]
loss: 0.774741 [1500/1875]
loss: 0.300537 [1600/1875]
loss: 0.626626 [1700/1875]
loss: 0.468578 [1800/1875]
Test:
Accuracy: 80.9%, Avg loss: 0.520291
Epoch 9
-------------------------------
loss: 0.393862 [ 0/1875]
loss: 0.242075 [100/1875]
loss: 0.380720 [200/1875]
loss: 0.613354 [300/1875]
loss: 0.313823 [400/1875]
loss: 0.708381 [500/1875]
loss: 0.501862 [600/1875]
loss: 0.367467 [700/1875]
loss: 0.601890 [800/1875]
loss: 0.417870 [900/1875]
loss: 0.465760 [1000/1875]
loss: 0.626054 [1100/1875]
loss: 0.564844 [1200/1875]
loss: 0.439832 [1300/1875]
loss: 0.247064 [1400/1875]
loss: 0.210835 [1500/1875]
loss: 0.672299 [1600/1875]
loss: 0.483748 [1700/1875]
loss: 0.694968 [1800/1875]
Test:
Accuracy: 83.5%, Avg loss: 0.455580
Epoch 10
-------------------------------
loss: 0.391544 [ 0/1875]
loss: 0.305324 [100/1875]
loss: 0.416478 [200/1875]
loss: 0.437293 [300/1875]
loss: 0.253441 [400/1875]
loss: 0.500829 [500/1875]
loss: 0.511331 [600/1875]
loss: 0.390801 [700/1875]
loss: 0.466885 [800/1875]
loss: 0.345322 [900/1875]
loss: 0.454487 [1000/1875]
loss: 0.409122 [1100/1875]
loss: 0.191410 [1200/1875]
loss: 0.507366 [1300/1875]
loss: 0.580812 [1400/1875]
loss: 0.367076 [1500/1875]
loss: 0.314957 [1600/1875]
loss: 0.409756 [1700/1875]
loss: 0.368590 [1800/1875]
Test:
Accuracy: 84.0%, Avg loss: 0.436875
Done!
实验二,控制其它变量不变,选择SGD优化器并使用参数momentum,设为0.9
[ ]
import csv
from mindvision.classification.models import lenet
from mindspore import nn
network2 = lenet(num_classes=10, num_channel=1, include_top=True)
net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
# 定义优化器函数
net_opt = nn.SGD(network2.trainable_params(), learning_rate=1e-2, momentum=0.9)
epochs = 10
csv_file2 = open('result/sgd2.csv', 'w', newline='')
writer2 = csv.writer(csv_file2)
writer2.writerow(['Accuracy', 'Avg_loss'])
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(network2, train_dataset, net_loss, net_opt)
test(network2, test_dataset, net_loss, writer2)
csv_file2.close()
print("Done!")
实验三,控制其它变量不变,选择SGD优化器并使用参数momentum,设为0.9,使用参数nesterov,设置为True
[ ]
import csv
from mindvision.classification.models import lenet
from mindspore import nn
network3 = lenet(num_classes=10, num_channel=1, include_top=True)
net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
# 定义优化器函数
net_opt = nn.SGD(network3.trainable_params(), learning_rate=1e-2, momentum=0.9, nesterov=True)
epochs = 10
csv_file3 = open('result/sgd3.csv', 'w', newline='')
writer3 = csv.writer(csv_file3)
writer3.writerow(['Accuracy', 'Avg_loss'])
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(network3, train_dataset, net_loss, net_opt)
test(network3, test_dataset, net_loss, writer3)
csv_file3.close()
print("Done!")
可视化分析
绘图
到此为止,我们获得了实验所有的数据,现在使用matplotlib对数据进行可视化分析。
首先对预测精度相关数据进行绘图
[ ]
import matplotlib.pyplot as plt
import numpy as np
f = open('result/sgd1.csv') # 打开csv文件
reader = csv.reader(f) # 读取csv文件
data1 = list(reader) # csv数据转换为列表
f.close()
f = open('result/sgd2.csv') # 打开csv文件
reader = csv.reader(f) # 读取csv文件
data2 = list(reader) # csv数据转换为列表
f.close()
f = open('result/sgd3.csv') # 打开csv文件
reader = csv.reader(f) # 读取csv文件
data3 = list(reader) # csv数据转换为列表
f.close()
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y1 = list()
y2 = list()
y3 = list()
for i in range(1, 11): # 从第二行开始读取
y1.append(float(data1[i][0]))
y2.append(float(data2[i][0]))
y3.append(float(data2[i][0]))
plt.plot(x, y1, color='g', linestyle='-', label = 'SGD')
plt.plot(x, y2, color='b', linestyle='-.', label = 'SGD+momentum')
plt.plot(x, y3, color='r', linestyle='--', label = 'SGD+momentum+nesterov')
plt.xlabel('epoch')
plt.ylabel('Accuracy')
plt.title('Accuracy graph')
plt.xlim((0, 10))
plt.ylim((0, 100))
my_y_ticks = np.arange(0, 100, 10)
plt.xticks(x)
plt.yticks(my_y_ticks)
plt.legend()
plt.savefig("sgd不同参数精确度对比图.png")
plt.show() # 显示折线图
这是笔者训练的结果:
对平均损失相关数据进行绘图
[ ]
import matplotlib.pyplot as plt
import numpy as np
f = open('result/sgd1.csv') # 打开csv文件
reader = csv.reader(f) # 读取csv文件
data1 = list(reader) # csv数据转换为列表
f.close()
f = open('result/sgd2.csv') # 打开csv文件
reader = csv.reader(f) # 读取csv文件
data2 = list(reader) # csv数据转换为列表
f.close()
f = open('result/sgd3.csv') # 打开csv文件
reader = csv.reader(f) # 读取csv文件
data3 = list(reader) # csv数据转换为列表
f.close()
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y1 = list()
y2 = list()
y3 = list()
for i in range(1, 11): # 从第二行开始读取
y1.append(float(data1[i][1]))
y2.append(float(data2[i][1]))
y3.append(float(data2[i][1]))
plt.plot(x, y1, color='g', linestyle='-', label = 'SGD')
plt.plot(x, y2, color='b', linestyle='-.', label = 'SGD+momentum')
plt.plot(x, y3, color='r', linestyle='--', label = 'SGD+momentum+nesterov')
plt.xlabel('epoch')
plt.ylabel('Avg_loss')
plt.title('Avg_loss graph')
plt.xlim((0, 10))
plt.ylim((0, 2.5))
my_y_ticks = np.arange(0, 3, 0.2)
plt.xticks(x)
plt.yticks(my_y_ticks)
plt.legend()
plt.savefig("sgd不同参数损失对比图.png")
plt.show() # 显示折线图
这是笔者训练的结果: