该案例来自吴恩达深度学习系列课程一《神经网络和深度学习》第三周编程作业,作业内容是设计带有一个隐藏层的平面数据分类。作业提供的资料包括测试实例(testCases.py )和任务功能包(planar_utils.py),下载请移步参考链接一。
文章目录
- [1 介绍](#1 介绍)
-
- [1.1 案例应用核心公式](#1.1 案例应用核心公式)
- [1.2 涉及库和主要接口](#1.2 涉及库和主要接口)
- [2 编码](#2 编码)
-
- [2.1 查看数据集相关参数](#2.1 查看数据集相关参数)
- [2.2 简单逻辑回归应用效果](#2.2 简单逻辑回归应用效果)
- [2.3 搭建神经网络](#2.3 搭建神经网络)
- [2.4 隐藏层单元数的变换](#2.4 隐藏层单元数的变换)
- [3 调试](#3 调试)
-
- [3.1 运行时警告](#3.1 运行时警告)
- [3.2 弃用警告](#3.2 弃用警告)
- [3.3 数据转换警告](#3.3 数据转换警告)
- [4 参考](#4 参考)
1 介绍
1.1 案例应用核心公式
两层神经网络的梯度下降
d W 1 = d J d W 1 dW^{1} = \frac{dJ}{dW^{1}} dW1=dW1dJ , d b 1 = d J d b 1 db^{1} = \frac{dJ}{db^{1}} db1=db1dJ , d W 2 = d J d W 2 {d}W^{2} = \frac{{dJ}}{dW^{2}} dW2=dW2dJ , d b 2 = d J d b 2 {d}b^{2} = \frac{dJ}{db^{2}} db2=db2dJ
W 1 ⟹ W 1 − α d W 1 W^{1}\implies{W^{1} - \alpha dW^{1}} W1⟹W1−αdW1, b 1 ⟹ b 1 − a d b 1 b^{1}\implies{b^{1} -adb^{1}} b1⟹b1−adb1
W 2 ⟹ W 2 − α d W 2 W^{2}\implies{W^{2} - \alpha{\rm d}W^{2}} W2⟹W2−αdW2, b 2 ⟹ b 2 − a d b 2 b^{2}\implies{b^{2} - a{\rm d}b^{2}} b2⟹b2−adb2
正向传播(forward propagation)过程如下:
Z 1 ( n 1 , m ) = W 1 ( n 1 , n 0 ) X ( n 0 , m ) + b 1 ( n 1 , 1 ) → ( n 1 , m ) \underset{(n^{1}, m)}{Z^{1}} = \underset{(n^{1}, n^{0})}{W^{1}}\underset{(n^{0}, m)}{X} + \underset{(n^{1}, 1)\rightarrow(n^{1}, m)}{b^{1}} (n1,m)Z1=(n1,n0)W1(n0,m)X+(n1,1)→(n1,m)b1
A 1 ( n 1 , m ) = g 1 ( Z 1 ) \underset{(n^{1}, m)}{A^{1}}= g^{1}(Z^{1}) (n1,m)A1=g1(Z1)
Z 2 ( n 2 , m ) = W 2 ( n 2 , n 1 ) A 1 ( n 1 , m ) + b 2 ( n 2 , 1 ) → ( n 2 , m ) \underset{(n^{2}, m)}{Z^{2}}= \underset{(n^{2}, n^{1})}{W^{2}}\underset{(n^{1}, m)}{A^{1}} + \underset{(n^{2}, 1)\rightarrow(n^{2}, m)}{b^{2}} (n2,m)Z2=(n2,n1)W2(n1,m)A1+(n2,1)→(n2,m)b2
A 2 ( n 2 , m ) = g 2 ( Z z ) = σ ( Z 2 ) \underset{(n^{2}, m)}{A^{2}} = g^{2}(Z^{z}) = \sigma(Z^{2}) (n2,m)A2=g2(Zz)=σ(Z2)
反向传播(back propagation)过程如下:
d Z 2 ( n 2 , m ) = A 2 − Y \underset{(n^{2}, m)}{dZ^{2}} = A^{2} - Y (n2,m)dZ2=A2−Y
d W 2 ( n 2 , n 1 ) = 1 m d Z 2 ( n 2 , m ) A 1 T ( m , n 1 ) \underset{(n^{2}, n^{1})}{dW^{2}} = {\frac{1}{m}}\underset{(n^{2}, m)}{dZ^{2}}\underset{(m,n^{1})}{A^{1T}} (n2,n1)dW2=m1(n2,m)dZ2(m,n1)A1T
d b 2 ( n 2 , 1 ) = 1 m n p . s u m ( d Z 2 , a x i s = 1 , k e e p d i m s = T r u e ) \underset{(n^{2}, 1)}{db^{2}} = {\frac{1}{m}}np.sum(dZ^{2},axis=1,keepdims=True) (n2,1)db2=m1np.sum(dZ2,axis=1,keepdims=True)
d Z 1 ( n 1 , m ) = W 2 T ( n 1 , n 2 ) d Z 2 ( n 2 , m ) ∗ g 1 ′ ( Z 1 ) ( n 1 , m ) \underset{(n^{1},m)}{dZ^{1}}= \underset{(n^{1},n^{2})}{W^{2T}}\underset{(n^{2},m)}{{ d}Z^{2}}*\underset{(n^{1}, m)}{g^{1}{'}(Z^{1})} (n1,m)dZ1=(n1,n2)W2T(n2,m)dZ2∗(n1,m)g1′(Z1)
d W 1 ( n 1 , n 0 ) = 1 m d Z 1 ( n 1 , m ) X T ( m , n 0 ) \underset{(n^{1}, n^{0})}{dW^{1}} = {\frac{1}{m}}\underset{(n^{1}, m)}{dZ^{1}}\underset{(m,n^{0})}{X^{T}} (n1,n0)dW1=m1(n1,m)dZ1(m,n0)XT
d b 1 ( n 1 , 1 ) = 1 m n p . s u m ( d Z 1 , a x i s = 1 , k e e p d i m s = T r u e ) \underset{(n^{1},1)}{db^{1}} = {\frac{1}{m}}np.sum(dZ^{1},axis=1,keepdims=True) (n1,1)db1=m1np.sum(dZ1,axis=1,keepdims=True)
1.2 涉及库和主要接口
numpy:用Python进行科学计算的基本软件包。
numpy.round:均匀地四舍五入到给定的小数位数。
numpy.random.seed:设置随机数生成器的种子,可以使随机数的生成具有可重复性。
sklearn:为数据挖掘和数据分析提供的机器学习库。
sklearn.linear_model.LogisticRegressionCV:逻辑回归CV(又名logit,MaxEnt)分类器。
matplotlib:用于在Python中绘制图表的库。
matplotlib.pyplot.scatter:具有不同标记大小和/或颜色的y与x的散点图。
2 编码
2.1 查看数据集相关参数
check_data.py
python
import matplotlib.pyplot as plt
from planar_utils import load_planar_dataset
X, Y = load_planar_dataset() # 加载数据
# 查看数据散点图
plt.scatter(X[0, :], X[1, :], c=Y, s=40, cmap=plt.cm.Spectral)
plt.show()
# 计算和打印相关参数
print("X的维度为:" + str(X.shape))
print("Y的维度为:" + str(Y.shape))
print("数据集里的数据个数为:" + str(Y.shape[1]))


2.2 简单逻辑回归应用效果
logic_nn.py
python
import numpy as np
import matplotlib.pyplot as plt
import sklearn.linear_model
from planar_utils import plot_decision_boundary, load_planar_dataset
X, Y = load_planar_dataset()
# 搭建模型并训练
clf = sklearn.linear_model.LogisticRegressionCV()
clf.fit(X.T, Y.T)
# 应用模型预测结果
predictions = clf.predict(X.T)
correct_predictions = ((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size)).reshape(1,)
print('准确率: %.2f' % (correct_predictions[0] * 100) + '%')
# 绘制颜色块边界
plot_decision_boundary(lambda x: clf.predict(x), X, Y)
plt.title("Logistic Regression")
plt.show()

经过测试,准确性只有47%,原因是数据集不是线性可分的,所以逻辑回归表现不佳。
线性可分:指在特征空间中,存在一个超平面能够将不同类别的数据点完全分开,在二维空间中超平面表现为一条直线。
2.3 搭建神经网络
double_layer_nn.py
python
import numpy as np
import matplotlib.pyplot as plt
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset
# 层单元数量
def layer_sizes(X, Y):
n_x = X.shape[0]
n_h = 4
n_y = Y.shape[0]
return n_x, n_h, n_y
# 初始化模型参数
def initialize_parameters(n_x, n_h, n_y):
W1 = np.random.rand(n_h, n_x) * 0.01
b1 = np.random.rand(n_h, 1)
W2 = np.random.rand(n_y, n_h) * 0.01
b2 = np.random.rand(n_y, 1)
parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
return parameters
# 前向传播
def forward_propagation(X, parameters):
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
Z1 = np.dot(W1, X) + b1
A1 = np.tanh(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = sigmoid(Z2)
cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}
return cache
# 计算代价函数
def compute_cost(A2, Y):
m = Y.shape[1]
cost = (-1 / m) * np.sum(Y * np.log(A2) + (1 - Y) * np.log(1 - A2))
cost = float(np.squeeze(cost))
return cost
# 反向传播
def backward_propagation(parameters, cache, X, Y):
m = X.shape[1]
W1 = parameters["W1"]
W2 = parameters["W2"]
A1 = cache["A1"]
A2 = cache["A2"]
dZ2 = A2 - Y
dW2 = (1 / m) * np.dot(dZ2, A1.T)
db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))
dW1 = (1 / m) * np.dot(dZ1, X.T)
db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)
grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}
return grads
# 更新参数
def update_parameters(parameters, grads, learning_rate=1.2):
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
dW1 = grads["dW1"]
db1 = grads["db1"]
dW2 = grads["dW2"]
db2 = grads["db2"]
W1 = W1 - learning_rate * dW1
b1 = b1 - learning_rate * db1
W2 = W2 - learning_rate * dW2
b2 = b2 - learning_rate * db2
parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
return parameters
# 构建神经网络
def nn_model(X, Y, n_h, num_iterations, learning_rate=0.5, print_cost=False):
n_x = layer_sizes(X, Y)[0]
n_y = layer_sizes(X, Y)[2]
parameters = initialize_parameters(n_x, n_h, n_y)
for i in range(num_iterations):
cache = forward_propagation(X, parameters)
cost = compute_cost(cache["A2"], Y)
grads = backward_propagation(parameters, cache, X, Y)
parameters = update_parameters(parameters, grads, learning_rate)
if print_cost and (i % 1000 == 0):
print("第 %i 次循环,成本为: %f" % (i, cost))
return parameters
# 预测函数
def predict(parameters, X):
cache = forward_propagation(X, parameters)
predictions = np.round(cache["A2"])
return predictions
# 进行深度学习
X, Y = load_planar_dataset()
n_h = 4
parameters = nn_model(X, Y, n_h, num_iterations=10000, learning_rate=0.5, print_cost=True)
predictions = predict(parameters, X)
correct_predictions = ((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size)).reshape(1,)
print('准确率: %.2f' % (correct_predictions[0] * 100) + '%')
# 绘制颜色块边界
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size %i : %.2f " % (n_h, correct_predictions[0] * 100) + '%')
plt.show()


2.4 隐藏层单元数的变换
变换隐藏层的单元数量,观察对预测结果是否产生哪些影响。
![]() |
![]() |
|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
3 调试
3.1 运行时警告
E:\pythonPrograming\DLHomework\course1-week3\double_layer_nn.py:53: RuntimeWarning: divide by zero encountered in log
python
logprobs= np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2))
E:\pythonPrograming\DLHomework\course1-week3\planar_utils.py:25: RuntimeWarning: overflow encountered in exp
python
s = 1/(1+np.exp(-x))
3.2 弃用警告
E:\pythonPrograming\DLHomework\course1-week3\double_layer_nn.py:135: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
python
print ('准确率: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
3.3 数据转换警告
E:\pythonPrograming\DLHomework\course1-week3.venv\Lib\site-packages\sklearn\utils\validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
python
clf.fit(X.T, Y.T)
此处问题出现在逻辑回归效果 logic_nn.py 中,由于提供的 Y 值是一个二维数组,与函数所需的一维数组存在数据转换问题,可以人工使用ravel() 或 flatten() 将 Y 转换为一维数组。
4 参考
【中文】【吴恩达课后编程作业】Course 1 - 神经网络和深度学习 - 第三周作业-CSDN博客
NumPy reference --- NumPy v2.1 Manual







