深度学习_三层神经网络传播案例(L0->L1->L2)

🌟 三层神经网络传播案例(L0→L1→L2L_0 \to L_1 \to L_2L0→L1→L2)

为了简化计算,我们将网络简化为:输入层 (2个神经元) → 隐藏层 1 (2个神经元) → 输出层 (1个神经元)

约定:

  • 激活函数: Sigmoid (σ(z)=11+e−z\sigma(z) = \frac{1}{1+e^{-z}}σ(z)=1+e−z1),其导数 σ′(z)=σ(z)(1−σ(z))\sigma'(z) = \sigma(z)(1-\sigma(z))σ′(z)=σ(z)(1−σ(z))。
  • 损失函数: 均方误差 (MSE): C=12(y^−y)2C = \frac{1}{2}(\hat{y} - y)^2C=21(y^−y)2。
  • 学习率 η\etaη: 0.10.10.1。

初始参数和输入值:

参数
输入 xxx (即 a(0)a^{(0)}a(0)) [0.05,0.10][0.05, 0.10][0.05,0.10]
真实标签 yyy [0.01][0.01][0.01]
L0→L1L_0 \to L_1L0→L1 权重 W(1)W^{(1)}W(1) (0.150.200.250.30)\begin{pmatrix} 0.15 & 0.20 \\ 0.25 & 0.30 \end{pmatrix}(0.150.250.200.30)
L1L_1L1 偏置 b(1)b^{(1)}b(1) [0.35,0.35][0.35, 0.35][0.35,0.35]
L1→L2L_1 \to L_2L1→L2 权重 W(2)W^{(2)}W(2) (0.400.45)\begin{pmatrix} 0.40 \\ 0.45 \end{pmatrix}(0.400.45) (转置后为 1×21 \times 21×2)
L2L_2L2 偏置 b(2)b^{(2)}b(2) [0.60][0.60][0.60]

🚀 阶段一:前向传播 (Forward Pass)

1. 从 L0L_0L0 到 L1L_1L1 (隐藏层)

  • 计算 L1L_1L1 的加权输入 z(1)z^{(1)}z(1):
    z1(1)=w11(1)x1+w21(1)x2+b1(1)=(0.15)(0.05)+(0.25)(0.10)+0.35=0.0075+0.025+0.35=0.3825\begin{split} z^{(1)}1 &= w^{(1)}{11}x_1 + w^{(1)}_{21}x_2 + b^{(1)}1 \\ &= (0.15)(0.05) + (0.25)(0.10) + 0.35 \\ &= 0.0075 + 0.025 + 0.35 = \mathbf{0.3825} \end{split}z1(1)=w11(1)x1+w21(1)x2+b1(1)=(0.15)(0.05)+(0.25)(0.10)+0.35=0.0075+0.025+0.35=0.3825
    z2(1)=w12(1)x1+w22(1)x2+b2(1)=(0.20)(0.05)+(0.30)(0.10)+0.35=0.01+0.03+0.35=0.39\begin{split} z^{(1)}2 &= w^{(1)}{12}x_1 + w^{(1)}
    {22}x_2 + b^{(1)}_2 \\ &= (0.20)(0.05) + (0.30)(0.10) + 0.35 \\ &= 0.01 + 0.03 + 0.35 = \mathbf{0.39} \end{split}z2(1)=w12(1)x1+w22(1)x2+b2(1)=(0.20)(0.05)+(0.30)(0.10)+0.35=0.01+0.03+0.35=0.39

  • 计算 L1L_1L1 的激活输出 a(1)a^{(1)}a(1):
    a1(1)=σ(0.3825)≈0.594(使用 11+e−0.3825)a^{(1)}_1 = \sigma(0.3825) \approx \mathbf{0.594} \quad (\text{使用 } \frac{1}{1+e^{-0.3825}})a1(1)=σ(0.3825)≈0.594(使用 1+e−0.38251)
    a2(1)=σ(0.39)≈0.596(使用 11+e−0.39)a^{(1)}_2 = \sigma(0.39) \approx \mathbf{0.596} \quad (\text{使用 } \frac{1}{1+e^{-0.39}})a2(1)=σ(0.39)≈0.596(使用 1+e−0.391)

2. 从 L1L_1L1 到 L2L_2L2 (输出层)

  • 计算 L2L_2L2 的加权输入 z(2)z^{(2)}z(2):
    z1(2)=w11(2)a1(1)+w21(2)a2(1)+b1(2)=(0.40)(0.594)+(0.45)(0.596)+0.60=0.2376+0.2682+0.60=1.1058\begin{split} z^{(2)}1 &= w^{(2)}{11}a^{(1)}1 + w^{(2)}{21}a^{(1)}_2 + b^{(2)}_1 \\ &= (0.40)(0.594) + (0.45)(0.596) + 0.60 \\ &= 0.2376 + 0.2682 + 0.60 = \mathbf{1.1058} \end{split}z1(2)=w11(2)a1(1)+w21(2)a2(1)+b1(2)=(0.40)(0.594)+(0.45)(0.596)+0.60=0.2376+0.2682+0.60=1.1058

  • 计算 L2L_2L2 的最终输出 y^\hat{y}y^ (即 a(2)a^{(2)}a(2)):
    y^=a1(2)=σ(1.1058)≈0.751\hat{y} = a^{(2)}_1 = \sigma(1.1058) \approx \mathbf{0.751}y^=a1(2)=σ(1.1058)≈0.751

3. 计算总损失 CCC

  • 均方误差损失:
    C=12(y^−y)2=12(0.751−0.01)2C = \frac{1}{2}(\hat{y} - y)^2 = \frac{1}{2}(0.751 - 0.01)^2C=21(y^−y)2=21(0.751−0.01)2
    C=12(0.741)2≈0.2745C = \frac{1}{2}(0.741)^2 \approx \mathbf{0.2745}C=21(0.741)2≈0.2745

📉 阶段二:反向传播 (Backward Pass)

1. 计算输出层 L2L_2L2 的误差项 δ(2)\delta^{(2)}δ(2)

δ(2)=∂C∂a(2)⊙σ′(z(2))\delta^{(2)} = \frac{\partial C}{\partial a^{(2)}} \odot \sigma'(z^{(2)})δ(2)=∂a(2)∂C⊙σ′(z(2))

  • 损失对输出的导数 ∂C∂a(2)\frac{\partial C}{\partial a^{(2)}}∂a(2)∂C: y^−y=0.751−0.01=0.741\hat{y} - y = 0.751 - 0.01 = 0.741y^−y=0.751−0.01=0.741
  • Sigmoid 导数 σ′(z(2))\sigma'(z^{(2)})σ′(z(2)): y^(1−y^)=0.751(1−0.751)≈0.187\hat{y}(1-\hat{y}) = 0.751(1-0.751) \approx 0.187y^(1−y^)=0.751(1−0.751)≈0.187
  • 误差项 δ(2)\delta^{(2)}δ(2):
    δ(2)=0.741×0.187≈0.1384\delta^{(2)} = 0.741 \times 0.187 \approx \mathbf{0.1384}δ(2)=0.741×0.187≈0.1384

2. 计算 L2L_2L2 的梯度并更新 W(2),b(2)W^{(2)}, b^{(2)}W(2),b(2)

  • 偏置梯度 ∂C∂b(2)\frac{\partial C}{\partial b^{(2)}}∂b(2)∂C: δ(2)=0.1384\delta^{(2)} = \mathbf{0.1384}δ(2)=0.1384

  • 权重梯度 ∂C∂W(2)\frac{\partial C}{\partial W^{(2)}}∂W(2)∂C: δ(2)⋅(a(1))T\delta^{(2)} \cdot (a^{(1)})^Tδ(2)⋅(a(1))T
    ∂C∂w11(2)=δ(2)a1(1)=0.1384×0.594≈0.0822∂C∂w21(2)=δ(2)a2(1)=0.1384×0.596≈0.0825\begin{split} \frac{\partial C}{\partial w^{(2)}_{11}} &= \delta^{(2)} a^{(1)}1 = 0.1384 \times 0.594 \approx \mathbf{0.0822} \\ \frac{\partial C}{\partial w^{(2)}{21}} &= \delta^{(2)} a^{(1)}_2 = 0.1384 \times 0.596 \approx \mathbf{0.0825} \end{split}∂w11(2)∂C∂w21(2)∂C=δ(2)a1(1)=0.1384×0.594≈0.0822=δ(2)a2(1)=0.1384×0.596≈0.0825

  • 梯度优化 (更新 W(2)W^{(2)}W(2)): (η=0.1\eta = 0.1η=0.1)
    w11,new(2)=0.40−0.1×0.0822≈0.3918w21,new(2)=0.45−0.1×0.0825≈0.4418bnew(2)=0.60−0.1×0.1384≈0.5862\begin{split} w^{(2)}{11, \text{new}} &= 0.40 - 0.1 \times 0.0822 \approx \mathbf{0.3918} \\ w^{(2)}{21, \text{new}} &= 0.45 - 0.1 \times 0.0825 \approx \mathbf{0.4418} \\ b^{(2)}_{\text{new}} &= 0.60 - 0.1 \times 0.1384 \approx \mathbf{0.5862} \end{split}w11,new(2)w21,new(2)bnew(2)=0.40−0.1×0.0822≈0.3918=0.45−0.1×0.0825≈0.4418=0.60−0.1×0.1384≈0.5862

3. 计算隐藏层 L1L_1L1 的误差项 δ(1)\delta^{(1)}δ(1)

δ(1)=((W(2))Tδ(2))⊙σ′(z(1))\delta^{(1)} = \left( (W^{(2)})^T \delta^{(2)} \right) \odot \sigma'(z^{(1)})δ(1)=((W(2))Tδ(2))⊙σ′(z(1))

  • 传播误差 EpropE_{\text{prop}}Eprop: (使用旧权重 W(2)W^{(2)}W(2))
    Eprop,1=w11(2)δ(2)=0.40×0.1384≈0.05536E_{\text{prop}, 1} = w^{(2)}{11} \delta^{(2)} = 0.40 \times 0.1384 \approx 0.05536Eprop,1=w11(2)δ(2)=0.40×0.1384≈0.05536
    Eprop,2=w21(2)δ(2)=0.45×0.1384≈0.06228E
    {\text{prop}, 2} = w^{(2)}_{21} \delta^{(2)} = 0.45 \times 0.1384 \approx 0.06228Eprop,2=w21(2)δ(2)=0.45×0.1384≈0.06228

  • Sigmoid 导数 σ′(z(1))\sigma'(z^{(1)})σ′(z(1)):
    σ′(z1(1))=a1(1)(1−a1(1))=0.594(1−0.594)≈0.2414\sigma'(z^{(1)}_1) = a^{(1)}_1(1-a^{(1)}_1) = 0.594(1-0.594) \approx 0.2414σ′(z1(1))=a1(1)(1−a1(1))=0.594(1−0.594)≈0.2414
    σ′(z2(1))=a2(1)(1−a2(1))=0.596(1−0.596)≈0.2416\sigma'(z^{(1)}_2) = a^{(1)}_2(1-a^{(1)}_2) = 0.596(1-0.596) \approx 0.2416σ′(z2(1))=a2(1)(1−a2(1))=0.596(1−0.596)≈0.2416

  • 误差项 δ(1)\delta^{(1)}δ(1):
    δ1(1)=Eprop,1×0.2414=0.05536×0.2414≈0.01338\delta^{(1)}1 = E{\text{prop}, 1} \times 0.2414 = 0.05536 \times 0.2414 \approx \mathbf{0.01338}δ1(1)=Eprop,1×0.2414=0.05536×0.2414≈0.01338
    δ2(1)=Eprop,2×0.2416=0.06228×0.2416≈0.01503\delta^{(1)}2 = E{\text{prop}, 2} \times 0.2416 = 0.06228 \times 0.2416 \approx \mathbf{0.01503}δ2(1)=Eprop,2×0.2416=0.06228×0.2416≈0.01503

4. 计算 L1L_1L1 的梯度并更新 W(1),b(1)W^{(1)}, b^{(1)}W(1),b(1)

  • 偏置梯度 ∂C∂b(1)\frac{\partial C}{\partial b^{(1)}}∂b(1)∂C:
    ∂C∂b1(1)=δ1(1)=0.01338\frac{\partial C}{\partial b^{(1)}_1} = \delta^{(1)}_1 = \mathbf{0.01338}∂b1(1)∂C=δ1(1)=0.01338
    ∂C∂b2(1)=δ2(1)=0.01503\frac{\partial C}{\partial b^{(1)}_2} = \delta^{(1)}_2 = \mathbf{0.01503}∂b2(1)∂C=δ2(1)=0.01503

  • 权重梯度 ∂C∂W(1)\frac{\partial C}{\partial W^{(1)}}∂W(1)∂C: δ(1)⋅(a(0))T\delta^{(1)} \cdot (a^{(0)})^Tδ(1)⋅(a(0))T
    ∂C∂w11(1)=δ1(1)x1=0.01338×0.05≈0.00067∂C∂w21(1)=δ1(1)x2=0.01338×0.10≈0.00134\begin{split} \frac{\partial C}{\partial w^{(1)}_{11}} &= \delta^{(1)}1 x_1 = 0.01338 \times 0.05 \approx \mathbf{0.00067} \\ \frac{\partial C}{\partial w^{(1)}{21}} &= \delta^{(1)}1 x_2 = 0.01338 \times 0.10 \approx \mathbf{0.00134} \end{split}∂w11(1)∂C∂w21(1)∂C=δ1(1)x1=0.01338×0.05≈0.00067=δ1(1)x2=0.01338×0.10≈0.00134
    ∂C∂w12(1)=δ2(1)x1=0.01503×0.05≈0.00075∂C∂w22(1)=δ2(1)x2=0.01503×0.10≈0.00150\begin{split} \frac{\partial C}{\partial w^{(1)}
    {12}} &= \delta^{(1)}2 x_1 = 0.01503 \times 0.05 \approx \mathbf{0.00075} \\ \frac{\partial C}{\partial w^{(1)}{22}} &= \delta^{(1)}_2 x_2 = 0.01503 \times 0.10 \approx \mathbf{0.00150} \end{split}∂w12(1)∂C∂w22(1)∂C=δ2(1)x1=0.01503×0.05≈0.00075=δ2(1)x2=0.01503×0.10≈0.00150

  • 梯度优化 (更新 W(1)W^{(1)}W(1)): (η=0.1\eta = 0.1η=0.1)
    w11,new(1)=0.15−0.1×0.00067≈0.1499w21,new(1)=0.25−0.1×0.00134≈0.2499w12,new(1)=0.20−0.1×0.00075≈0.1999w22,new(1)=0.30−0.1×0.00150≈0.2998\begin{split} w^{(1)}{11, \text{new}} &= 0.15 - 0.1 \times 0.00067 \approx \mathbf{0.1499} \\ w^{(1)}{21, \text{new}} &= 0.25 - 0.1 \times 0.00134 \approx \mathbf{0.2499} \\ w^{(1)}{12, \text{new}} &= 0.20 - 0.1 \times 0.00075 \approx \mathbf{0.1999} \\ w^{(1)}{22, \text{new}} &= 0.30 - 0.1 \times 0.00150 \approx \mathbf{0.2998} \end{split}w11,new(1)w21,new(1)w12,new(1)w22,new(1)=0.15−0.1×0.00067≈0.1499=0.25−0.1×0.00134≈0.2499=0.20−0.1×0.00075≈0.1999=0.30−0.1×0.00150≈0.2998

总结

经过一个完整的训练周期(前向传播和反向传播),所有的权重 WWW 和偏置 bbb 都向着降低总损失 CCC 的方向进行了微小的更新。在下一次迭代中,网络将使用这些新的参数进行计算,理论上会得到一个更接近 0.010.010.01 的输出 y^\hat{y}y^。

相关推荐
机器之心2 小时前
首个完整开源的生成式推荐框架MiniOneRec,轻量复现工业级OneRec!
人工智能·openai
_codemonster3 小时前
深度学习实战(基于pytroch)系列(十五)模型构造
人工智能·深度学习
海域云赵从友3 小时前
2025年印尼服务器选型指南:跨境业务落地的合规与性能双解
人工智能·git·github
xuehaikj3 小时前
【深度学习】YOLOv10n-MAN-Faster实现包装盒flap状态识别与分类,提高生产效率
深度学习·yolo·分类
sponge'3 小时前
opencv学习笔记9:基于CNN的mnist分类任务
深度学习·神经网络·cnn
用户5191495848454 小时前
cURL变量管理中的缓冲区越界读取漏洞分析
人工智能·aigc
iFlow_AI4 小时前
增强AI编程助手效能:使用开源Litho(deepwiki-rs)深度上下文赋能iFlow
人工智能·ai·ai编程·命令模式·iflow·iflow cli·心流ai助手
AI街潜水的八角4 小时前
深度学习杂草分割系统1:数据集说明(含下载链接)
人工智能·深度学习·分类
TG:@yunlaoda360 云老大4 小时前
谷歌云发布 Document AI Workbench 最新功能:自定义文档拆分器实现复杂文档处理自动化
运维·人工智能·自动化·googlecloud