四层神经网络案例(含反向传播)

🧠 四层神经网络案例(含反向传播)

📘 网络结构

层次 名称 神经元个数
L₁ 输入层 2
L₂ 隐藏层1 2
L₃ 隐藏层2 2
L₄ 输出层 1

激活函数:Sigmoid

损失函数:均方误差 (MSE)


⚙️ 一、前向传播(Forward Propagation)

输入:
x1=1,x2=2x_1=1, \quad x_2=2x1=1,x2=2

目标输出:
ytrue=1y_{true}=1ytrue=1

1️⃣ 权重与偏置设定

权重矩阵 偏置
L1→L2 KaTeX parse error: Undefined control sequence: \0 at position 31: ...atrix}0.1 & 0.2\̲0̲.3 & 0.4\end{bm... b1=0.1,0.2b_1=0.1, 0.2b1=0.1,0.2
L2→L3 KaTeX parse error: Undefined control sequence: \0 at position 31: ...atrix}0.5 & 0.6\̲0̲.7 & 0.8\end{bm... b2=0.1,0.2b_2=0.1, 0.2b2=0.1,0.2
L3→L4 KaTeX parse error: Undefined control sequence: \1 at position 25: ...gin{bmatrix}0.9\̲1̲.0\end{bmatrix} b3=0.3b_3=0.3b3=0.3

2️⃣ 层层计算

隐藏层1(L₂)

z(2)=XW1+b1z^{(2)} = XW_1 + b_1z(2)=XW1+b1
KaTeX parse error: Undefined control sequence: \0 at position 40: ...atrix}0.1 & 0.2\̲0̲.3 & 0.4\end{bm...

激活:
a(2)=σ(z(2))=0.68997,0.76852a^{(2)} = \sigma(z^{(2)}) = 0.68997, 0.76852a(2)=σ(z(2))=0.68997,0.76852


隐藏层2(L₃)

z(3)=a(2)W2+b2z^{(3)} = a^{(2)}W_2 + b_2z(3)=a(2)W2+b2
KaTeX parse error: Undefined control sequence: \0 at position 52: ...atrix}0.5 & 0.6\̲0̲.7 & 0.8\end{bm...

激活:
a(3)=σ(z(3))=0.7043,0.7421a^{(3)} = \sigma(z^{(3)}) = 0.7043, 0.7421a(3)=σ(z(3))=0.7043,0.7421


输出层(L₄)

z(4)=a(3)W3+b3z^{(4)} = a^{(3)}W_3 + b_3z(4)=a(3)W3+b3
KaTeX parse error: Undefined control sequence: \1 at position 44: ...gin{bmatrix}0.9\̲1̲.0\end{bmatrix}...

激活:
y^=σ(1.6740)=0.8421\hat{y} = \sigma(1.6740) = 0.8421y^=σ(1.6740)=0.8421


前向传播结果

激活输出
a(2)a^{(2)}a(2) 0.68997, 0.76852
a(3)a^{(3)}a(3) 0.7043, 0.7421
a(4)=y^a^{(4)}=\hat{y}a(4)=y^ 0.8421

损失:
L=12(ytrue−y^)2=0.0124L = \frac{1}{2}(y_{true}-\hat{y})^2 = 0.0124L=21(ytrue−y^)2=0.0124


🔁 二、反向传播(Backpropagation)

1️⃣ 输出层梯度

dLdy^=(y^−ytrue)=−0.1579\frac{dL}{d\hat{y}} = (\hat{y} - y_{true}) = -0.1579dy^dL=(y^−ytrue)=−0.1579
dy^dz(4)=y^(1−y^)=0.1329\frac{d\hat{y}}{dz^{(4)}} = \hat{y}(1-\hat{y}) = 0.1329dz(4)dy^=y^(1−y^)=0.1329
dLdz(4)=−0.0210\frac{dL}{dz^{(4)}} = -0.0210dz(4)dL=−0.0210

权重与偏置梯度:
KaTeX parse error: Undefined control sequence: \- at position 91: ...bmatrix}-0.0148\̲-̲0.0156\end{bmat...
db3=−0.0210db_3 = -0.0210db3=−0.0210


2️⃣ 反传到隐藏层2(L₃)

dLda(3)=W3dLdz(4)=−0.0189,−0.0210\frac{dL}{da^{(3)}} = W_3 \frac{dL}{dz^{(4)}} = -0.0189, -0.0210da(3)dL=W3dz(4)dL=−0.0189,−0.0210
da(3)dz(3)=a(3)(1−a(3))=0.2083,0.1914\frac{da^{(3)}}{dz^{(3)}} = a^{(3)}(1-a^{(3)}) = 0.2083, 0.1914dz(3)da(3)=a(3)(1−a(3))=0.2083,0.1914
dLdz(3)=−0.00394,−0.00402\frac{dL}{dz^{(3)}} = -0.00394, -0.00402dz(3)dL=−0.00394,−0.00402

梯度:
dW2=a(2)TdLdz(3)=−0.00272−0.00278 −0.00303−0.00309dW_2 = a^{(2)T}\frac{dL}{dz^{(3)}} = \begin{bmatrix}-0.00272 & -0.00278\ -0.00303 & -0.00309\end{bmatrix}dW2=a(2)Tdz(3)dL=−0.00272−0.00278 −0.00303−0.00309
db2=−0.00394,−0.00402db_2 = -0.00394, -0.00402db2=−0.00394,−0.00402


3️⃣ 反传到隐藏层1(L₂)

dLda(2)=dLdz(3)W2T=−0.00563,−0.00624\frac{dL}{da^{(2)}} = \frac{dL}{dz^{(3)}}W_2^T = -0.00563, -0.00624da(2)dL=dz(3)dLW2T=−0.00563,−0.00624
da(2)dz(2)=a(2)(1−a(2))=0.2148,0.1778\frac{da^{(2)}}{dz^{(2)}} = a^{(2)}(1-a^{(2)}) = 0.2148, 0.1778dz(2)da(2)=a(2)(1−a(2))=0.2148,0.1778
dLdz(2)=−0.00121,−0.00111\frac{dL}{dz^{(2)}} = -0.00121, -0.00111dz(2)dL=−0.00121,−0.00111

梯度:
dW1=XTdLdz(2)=−0.00121−0.00111 −0.00242−0.00222dW_1 = X^T\frac{dL}{dz^{(2)}} = \begin{bmatrix}-0.00121 & -0.00111\ -0.00242 & -0.00222\end{bmatrix}dW1=XTdz(2)dL=−0.00121−0.00111 −0.00242−0.00222
db1=−0.00121,−0.00111db_1 = -0.00121, -0.00111db1=−0.00121,−0.00111


🧩 三、反向传播计算图

复制代码
        (x1,x2)
           │
           ▼
       [Layer1] -----------→  dL/dz2 → dW1
           │
           ▼
       [Layer2] -----------→  dL/dz3 → dW2
           │
           ▼
       [Output] -----------→  dL/dz4 → dW3

或完整箭头图:

复制代码
Forward:  X → Z2 → A2 → Z3 → A3 → Z4 → A4(ŷ)
Backward:        ← dZ2 ← dA2 ← dZ3 ← dA3 ← dZ4 ← dA4

✅ 四、总结

步骤 内容 说明
前向传播 X → Ŷ 计算预测输出
计算损失 L(Ŷ, Y) 度量误差
反向传播 dL/dW, dL/db 计算梯度
参数更新 W ← W - η·dW 用学习率更新参数
相关推荐
冬哥聊AI几秒前
Loop Engineering 来了:从写 Prompt 到设计 Loop,AI 编程的第四次范式跃迁
人工智能
柒星栈5 分钟前
Codex 不只是更强的代码助手,它开始像代理一样推进开发任务了
人工智能
o_insist12 分钟前
04-从零手写 ReAct 循环:Agent 的心跳是怎么转起来的
人工智能·agent
DayByDay13 分钟前
从“单专家”到“多专家辩论”:多大脑对话实现复盘
人工智能
狗哥哥20 分钟前
知乎回答二次创作转AI 漫画/视频思路分享
人工智能
极速蜗牛35 分钟前
我在 Taro 小程序项目里实践的 API First + AI 编程方式
前端·人工智能·后端
桜吹雪40 分钟前
所有智能体架构(3):Planning(计划任务)
javascript·人工智能·langchain
武子康40 分钟前
调查研究-176 taste-skill:AI 编程时代,前端开发最缺的不是代码,而是品味
人工智能·openai·claude
码语智行42 分钟前
工具调用MCP_Server 开发梳理
人工智能
lili00121 小时前
2026 企业 AI 选型新范式:OpenRouter Fusion 证明多模型融合性价比远超单模型,企业该如何重构技术栈? - 微元算力(weytoken)
java·人工智能·python·重构·ai编程