前言:在求梯度时,直接使用weight_gradients = 2 * np.mean(np.dot(errors*y_hat*(1-y_hat)),features) 去得到结果。发现结果与答案不同。
1、问题描述
具体问题情景:编写一个Python函数,模拟具有sigmoid激活的单个神经元,并实现反向传播来更新神经元的权重和偏置。该函数应基于MSE损失使用梯度下降来更新权重和偏差,并返回更新后的权重、偏差和每个历元的MSE值列表,每个值四舍五入到小数点后四位。(Deep-ML | Problem)
直接使用weight_gradients = 2 * np.mean(np.dot(errors*y_hat*(1-y_hat)),features)来得到梯度的结果,在平时写代码很少去实现这些基础的操作。大多数情况都是直接调用现成的api来得到结果。在自我实现时,漏洞百出。
2、解决问题
错误的代码
import numpy as np
def sigmod(x):
return 1/(1+np.exp(-x))
def train_neuron(features: np.ndarray, labels: np.ndarray, initial_weights: np.ndarray,
initial_bias: float, learning_rate: float, epochs: int) -> (np.ndarray, float, list[float]):
# 确保输入为numpy数组
features = np.asarray(features)
labels = np.asarray(labels)
weights = np.asarray(initial_weights)
bias = initial_bias
losses = []
for _ in range(epochs):
z = np.dot(features, weights) + bias
y_hat =sigmod(z)
errors =y_hat -labels
loss =np.mean(errors**2)
weight_gradients = 2 * np.mean(np.dot(errors*y_hat*(1-y_hat)),features)
bias_gradient = 2 * np.mean(errors*y_hat*(1-y_hat))
weights -= learning_rate * weight_gradients
bias -= learning_rate * bias_gradient
updated_weights = np.round(weights, 4)
updated_bias = round(bias, 4)
losses.append(round(loss, 4))
return updated_weights,updated_bias,losses
正确的代码
import numpy as np
def sigmod(x):
return 1/(1+np.exp(-x))
def train_neuron(features: np.ndarray, labels: np.ndarray, initial_weights: np.ndarray,
initial_bias: float, learning_rate: float, epochs: int) -> (np.ndarray, float, list[float]):
# 确保输入为numpy数组
features = np.asarray(features)
labels = np.asarray(labels)
weights = np.asarray(initial_weights)
bias = initial_bias
losses = []
for _ in range(epochs):
z = np.dot(features, weights) + bias
y_hat =sigmod(z)
errors =y_hat -labels
loss =np.mean(errors**2)
weight_gradients = 2 * np.mean((errors*y_hat*(1-y_hat))[:, np.newaxis] * features, axis=0)
bias_gradient = 2 * np.mean(errors*y_hat*(1-y_hat))
# weight_gradients =2/len(labels)*np.dot(features.T,errors*y_hat*(1-y_hat))
# bias_gradient =2/len(labels)*np.sum(errors*y_hat*(1-y_hat))
weights -= learning_rate * weight_gradients
bias -= learning_rate * bias_gradient
updated_weights = np.round(weights, 4)
updated_bias = round(bias, 4)
losses.append(round(loss, 4))
return updated_weights,updated_bias,losses
martix[:, np.newaxis]会在指定的位置(这里是列方向)插入一个新的轴(维度),将 一维数组 转换为 二维列向量, 类似于 reshape(-1, 1).
在pytorch中,会使用**torch.unsqueeze()
** 在指定位置插入一个长度为 1 的新维度(类似于 NumPy 的 np.newaxis。