🧠 四层神经网络案例(含反向传播)
📘 网络结构
| 层次 | 名称 | 神经元个数 |
|---|---|---|
| L₁ | 输入层 | 2 |
| L₂ | 隐藏层1 | 2 |
| L₃ | 隐藏层2 | 2 |
| L₄ | 输出层 | 1 |
激活函数:Sigmoid
损失函数:均方误差 (MSE)
⚙️ 一、前向传播(Forward Propagation)
输入:
x1=1,x2=2x_1=1, \quad x_2=2x1=1,x2=2
目标输出:
ytrue=1y_{true}=1ytrue=1
1️⃣ 权重与偏置设定
| 层 | 权重矩阵 | 偏置 |
|---|---|---|
| L1→L2 | KaTeX parse error: Undefined control sequence: \0 at position 31: ...atrix}0.1 & 0.2\̲0̲.3 & 0.4\end{bm... | b1=[0.1,0.2]b_1=[0.1, 0.2]b1=[0.1,0.2] |
| L2→L3 | KaTeX parse error: Undefined control sequence: \0 at position 31: ...atrix}0.5 & 0.6\̲0̲.7 & 0.8\end{bm... | b2=[0.1,0.2]b_2=[0.1, 0.2]b2=[0.1,0.2] |
| L3→L4 | KaTeX parse error: Undefined control sequence: \1 at position 25: ...gin{bmatrix}0.9\̲1̲.0\end{bmatrix} | b3=[0.3]b_3=[0.3]b3=[0.3] |
2️⃣ 层层计算
隐藏层1(L₂)
z(2)=XW1+b1z^{(2)} = XW_1 + b_1z(2)=XW1+b1
KaTeX parse error: Undefined control sequence: \0 at position 40: ...atrix}0.1 & 0.2\̲0̲.3 & 0.4\end{bm...
激活:
a(2)=σ(z(2))=[0.68997,0.76852]a^{(2)} = \sigma(z^{(2)}) = [0.68997, 0.76852]a(2)=σ(z(2))=[0.68997,0.76852]
隐藏层2(L₃)
z(3)=a(2)W2+b2z^{(3)} = a^{(2)}W_2 + b_2z(3)=a(2)W2+b2
KaTeX parse error: Undefined control sequence: \0 at position 52: ...atrix}0.5 & 0.6\̲0̲.7 & 0.8\end{bm...
激活:
a(3)=σ(z(3))=[0.7043,0.7421]a^{(3)} = \sigma(z^{(3)}) = [0.7043, 0.7421]a(3)=σ(z(3))=[0.7043,0.7421]
输出层(L₄)
z(4)=a(3)W3+b3z^{(4)} = a^{(3)}W_3 + b_3z(4)=a(3)W3+b3
KaTeX parse error: Undefined control sequence: \1 at position 44: ...gin{bmatrix}0.9\̲1̲.0\end{bmatrix}...
激活:
y^=σ(1.6740)=0.8421\hat{y} = \sigma(1.6740) = 0.8421y^=σ(1.6740)=0.8421
✅ 前向传播结果
| 层 | 激活输出 |
|---|---|
| a(2)a^{(2)}a(2) | [0.68997, 0.76852] |
| a(3)a^{(3)}a(3) | [0.7043, 0.7421] |
| a(4)=y^a^{(4)}=\hat{y}a(4)=y^ | 0.8421 |
损失:
L=12(ytrue−y^)2=0.0124L = \frac{1}{2}(y_{true}-\hat{y})^2 = 0.0124L=21(ytrue−y^)2=0.0124
🔁 二、反向传播(Backpropagation)
1️⃣ 输出层梯度
dLdy^=(y^−ytrue)=−0.1579\frac{dL}{d\hat{y}} = (\hat{y} - y_{true}) = -0.1579dy^dL=(y^−ytrue)=−0.1579
dy^dz(4)=y^(1−y^)=0.1329\frac{d\hat{y}}{dz^{(4)}} = \hat{y}(1-\hat{y}) = 0.1329dz(4)dy^=y^(1−y^)=0.1329
dLdz(4)=−0.0210\frac{dL}{dz^{(4)}} = -0.0210dz(4)dL=−0.0210
权重与偏置梯度:
KaTeX parse error: Undefined control sequence: \- at position 91: ...bmatrix}-0.0148\̲-̲0.0156\end{bmat...
db3=−0.0210db_3 = -0.0210db3=−0.0210
2️⃣ 反传到隐藏层2(L₃)
dLda(3)=W3dLdz(4)=[−0.0189,−0.0210]\frac{dL}{da^{(3)}} = W_3 \frac{dL}{dz^{(4)}} = [-0.0189, -0.0210]da(3)dL=W3dz(4)dL=[−0.0189,−0.0210]
da(3)dz(3)=a(3)(1−a(3))=[0.2083,0.1914]\frac{da^{(3)}}{dz^{(3)}} = a^{(3)}(1-a^{(3)}) = [0.2083, 0.1914]dz(3)da(3)=a(3)(1−a(3))=[0.2083,0.1914]
dLdz(3)=[−0.00394,−0.00402]\frac{dL}{dz^{(3)}} = [-0.00394, -0.00402]dz(3)dL=[−0.00394,−0.00402]
梯度:
dW2=a(2)TdLdz(3)=[−0.00272−0.00278 −0.00303−0.00309]dW_2 = a^{(2)T}\frac{dL}{dz^{(3)}} = \begin{bmatrix}-0.00272 & -0.00278\ -0.00303 & -0.00309\end{bmatrix}dW2=a(2)Tdz(3)dL=[−0.00272−0.00278 −0.00303−0.00309]
db2=[−0.00394,−0.00402]db_2 = [-0.00394, -0.00402]db2=[−0.00394,−0.00402]
3️⃣ 反传到隐藏层1(L₂)
dLda(2)=dLdz(3)W2T=[−0.00563,−0.00624]\frac{dL}{da^{(2)}} = \frac{dL}{dz^{(3)}}W_2^T = [-0.00563, -0.00624]da(2)dL=dz(3)dLW2T=[−0.00563,−0.00624]
da(2)dz(2)=a(2)(1−a(2))=[0.2148,0.1778]\frac{da^{(2)}}{dz^{(2)}} = a^{(2)}(1-a^{(2)}) = [0.2148, 0.1778]dz(2)da(2)=a(2)(1−a(2))=[0.2148,0.1778]
dLdz(2)=[−0.00121,−0.00111]\frac{dL}{dz^{(2)}} = [-0.00121, -0.00111]dz(2)dL=[−0.00121,−0.00111]
梯度:
dW1=XTdLdz(2)=[−0.00121−0.00111 −0.00242−0.00222]dW_1 = X^T\frac{dL}{dz^{(2)}} = \begin{bmatrix}-0.00121 & -0.00111\ -0.00242 & -0.00222\end{bmatrix}dW1=XTdz(2)dL=[−0.00121−0.00111 −0.00242−0.00222]
db1=[−0.00121,−0.00111]db_1 = [-0.00121, -0.00111]db1=[−0.00121,−0.00111]
🧩 三、反向传播计算图
(x1,x2)
│
▼
[Layer1] -----------→ dL/dz2 → dW1
│
▼
[Layer2] -----------→ dL/dz3 → dW2
│
▼
[Output] -----------→ dL/dz4 → dW3
或完整箭头图:
Forward: X → Z2 → A2 → Z3 → A3 → Z4 → A4(ŷ)
Backward: ← dZ2 ← dA2 ← dZ3 ← dA3 ← dZ4 ← dA4
✅ 四、总结
| 步骤 | 内容 | 说明 |
|---|---|---|
| 前向传播 | X → Ŷ | 计算预测输出 |
| 计算损失 | L(Ŷ, Y) | 度量误差 |
| 反向传播 | dL/dW, dL/db | 计算梯度 |
| 参数更新 | W ← W - η·dW | 用学习率更新参数 |