深度学习——神经网络(neural network)详解(二). 带手算步骤,步骤清晰0基础可看

深度学习------神经网络(neural network)详解(二). 手算步骤,步骤清晰0基础可看

前文如下:深度学习------神经网络(neural network)详解(一). 带手算步骤,步骤清晰0基础可看

运用神经网络模型进行房价预测具体手算过程,具体示例

假设我们有一个简单的神经网络,还是之前这个神经网络,输入层2个节点,隐藏层3个节点,输出层1个节点。我们使用以下简化的示例数据:

(一)函数介绍

  1. Sigmoid 函数 :我们把Sigmoid 函数作为激活函数,用于数组的映射转换,公式为:
    σ ( z ) = 1 1 + e − z \sigma(z) = \frac{1}{1 + e^{-z}} σ(z)=1+e−z1

  2. Sigmoid 函数的导数 :反向传播中计算梯度所需的导数,公式为:
    σ ′ ( z ) = σ ( z ) ( 1 − σ ( z ) ) \sigma'(z) = \sigma(z)(1 - \sigma(z)) σ′(z)=σ(z)(1−σ(z))

  3. 均方误差(MSE)损失函数 :衡量预测值与实际值差异的指标,公式为:
    L = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 L = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 L=n1i=1∑n(yi−y^i)2

    在单样本情况下简化为:
    L = ( y − y ^ ) 2 L = (y - \hat{y})^2 L=(y−y^)2

  4. 均方误差(MSE)损失函数的导数

    在单样本情况下简化为:
    ∂ L ∂ y ^ = − 2 ( y − y ^ ) \frac{\partial L}{\partial \hat{y}} = -2(y-\hat{y}) ∂y^∂L=−2(y−y^)

4.权重的梯度

涉及到 w 1 w_{1} w1的公式如下:
L = ( y − y ^ ) 2 L = (y - \hat{y})^2 L=(y−y^)2
y ^ = σ ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) \hat{y} = \sigma(w_7 h_1 + w_8 h_2 + w_9 h_3) y^=σ(w7h1+w8h2+w9h3)
h 1 = σ ( w 1 x 1 + b 1 + w 4 x 2 + b 4 ) h_1 = \sigma(w_{1}x_1 + b_1+ w_{4}x_2 + b_4) h1=σ(w1x1+b1+w4x2+b4)

比如对于 w 1 w_{1} w1,其求梯度公式为:
d L d w 1 = d L d y ^ ⋅ d y ^ d h 1 ⋅ d h 1 d w 1 \frac{dL}{dw_{1}} = \frac{dL}{d\hat{y}} \cdot \frac{d\hat{y}}{dh_{1}} \cdot \frac{dh_{1}}{dw_{1}} dw1dL=dy^dL⋅dh1dy^⋅dw1dh1

(1) d L d y ^ = − 2 ( y − y ^ ) \frac{dL}{d\hat{y}}=-2(y-\hat{y}) dy^dL=−2(y−y^)

(2)计算 d y ^ d h 1 \frac{d\hat{y}}{dh_{1}} dh1dy^

令 z = w 7 h 1 + w 8 h 2 + w 9 h 3 z=w_7 h_1 + w_8 h_2 + w_9 h_3 z=w7h1+w8h2+w9h3,

则 y ^ = σ ( z ) = σ ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) \hat{y}=\sigma(z)=\sigma(w_7 h_1 + w_8 h_2 + w_9 h_3) y^=σ(z)=σ(w7h1+w8h2+w9h3),

而 σ ′ ( z ) = σ ( z ) ( 1 − σ ( z ) ) \sigma'(z) = \sigma(z)(1 - \sigma(z)) σ′(z)=σ(z)(1−σ(z))

所以
d y ^ d h 1 = σ ( z ) ⋅ ( 1 − σ ( z ) ) ⋅ d z d h 1 \frac{d\hat{y}}{dh_{1}}=\sigma(z)\cdot(1 - \sigma(z))\cdot\frac{dz}{dh_{1}} dh1dy^=σ(z)⋅(1−σ(z))⋅dh1dz
d y ^ d h 1 = ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ⋅ ( 1 − ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ) ⋅ w 7 \frac{d\hat{y}}{dh_{1}}=(w_7 h_1 + w_8 h_2 + w_9 h_3)\cdot(1 - (w_7 h_1 + w_8 h_2 + w_9 h_3))\cdot w_7 dh1dy^=(w7h1+w8h2+w9h3)⋅(1−(w7h1+w8h2+w9h3))⋅w7

(3)同理计算 d h 1 d w 1 \frac{dh_{1}}{dw_{1}} dw1dh1
h 1 = σ ( w 1 x 1 + b 1 + w 4 x 2 + b 4 ) h_1 = \sigma(w_{1}x_1 + b_1+ w_{4}x_2 + b_4) h1=σ(w1x1+b1+w4x2+b4)
d h 1 d w 1 = ( w 1 x 1 + b 1 + w 4 x 2 + b 4 ) ⋅ ( 1 − ( w 1 x 1 + b 1 + w 4 x 2 + b 4 ) ) ⋅ x 1 \frac{dh_{1}}{dw_{1}}=(w_{1}x_1 + b_1+ w_{4}x_2 + b_4)\cdot(1 - (w_{1}x_1 + b_1+ w_{4}x_2 + b_4))\cdot x_1 dw1dh1=(w1x1+b1+w4x2+b4)⋅(1−(w1x1+b1+w4x2+b4))⋅x1

所以
d L d w 1 = d L d y ^ ⋅ d y ^ d h 1 ⋅ d h 1 d w 1 = − 2 ( y − y ^ ) ⋅ ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ( 1 − ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ) w 7 ⋅ ( w 1 x 1 + b 1 + w 4 x 2 + b 4 ) ( 1 − ( w 1 x 1 + b 1 + w 4 x 2 + b 4 ) ) x 1 \frac{dL}{dw_{1}} = \frac{dL}{d\hat{y}} \cdot \frac{d\hat{y}}{dh_{1}} \cdot \frac{dh_{1}}{dw_{1}}=-2(y-\hat{y}) \cdot (w_7 h_1 + w_8 h_2 + w_9 h_3) (1 - (w_7 h_1 + w_8 h_2 + w_9 h_3)) w_7 \cdot (w_{1}x_1 + b_1+ w_{4}x_2 + b_4) (1 - (w_{1}x_1 + b_1+ w_{4}x_2 + b_4)) x_1 dw1dL=dy^dL⋅dh1dy^⋅dw1dh1=−2(y−y^)⋅(w7h1+w8h2+w9h3)(1−(w7h1+w8h2+w9h3))w7⋅(w1x1+b1+w4x2+b4)(1−(w1x1+b1+w4x2+b4))x1

(二)参数更新过程

1.输入数据(样本)
  • 输入特征: X = [ 120 , 1 ] X = [120, 1] X=[120,1](面积120平方米,市中心位置)
  • 目标值: y = 300 , 000 y = 300,000 y=300,000(房价300,000元)
2.输入到隐藏层的权重和偏置随机初始化
  • 权重 W h = [ w 1 , w 2 , w 3 , w 4 , w 5 , w 6 ] = [ 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 ] W_h = [w_1, w_2, w_3, w_4, w_5, w_6] = [0.2, 0.3, 0.4, 0.5, 0.6, 0.7] Wh=[w1,w2,w3,w4,w5,w6]=[0.2,0.3,0.4,0.5,0.6,0.7]
  • 偏置 b h = [ b 1 , b 2 , b 3 , b 4 , b 5 , b 6 , b 7 , b 8 , b 9 ] = [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] b_h = [b_1, b_2, b_3,b_4, b_5, b_6,b_7, b_8, b_9] = [0, 0, 0,0, 0, 0,0, 0, 0] bh=[b1,b2,b3,b4,b5,b6,b7,b8,b9]=[0,0,0,0,0,0,0,0,0](假设偏置为0)
3.激活函数输出

h 1 = σ ( w 1 ⋅ x 1 + w 4 ⋅ x 2 ) h_1 = \sigma(w_1 \cdot x_1 + w_4 \cdot x_2) h1=σ(w1⋅x1+w4⋅x2)
h 2 = σ ( w 2 ⋅ x 1 + w 5 ⋅ x 2 ) h_2 = \sigma(w_2 \cdot x_1 + w_5 \cdot x_2) h2=σ(w2⋅x1+w5⋅x2)
h 3 = σ ( w 3 ⋅ x 1 + w 6 ⋅ x 2 ) h_3 = \sigma(w_3 \cdot x_1 + w_6 \cdot x_2) h3=σ(w3⋅x1+w6⋅x2)

4.具体计算

h 1 = σ ( 0.2 × 120 + 0.5 × 1 ) = σ ( 24.2 ) h_1 = \sigma(0.2 \times 120 + 0.5 \times 1)=\sigma(24.2) h1=σ(0.2×120+0.5×1)=σ(24.2)
h 2 = σ ( 0.3 × 120 + 0.6 × 1 ) = σ ( 37.8 ) h_2 = \sigma(0.3 \times 120 + 0.6 \times 1)=\sigma(37.8) h2=σ(0.3×120+0.6×1)=σ(37.8)
h 3 = σ ( 0.4 × 120 − 0.7 × 1 ) = σ ( 46.3 ) h_3 = \sigma(0.4 \times 120 - 0.7 \times 1)=\sigma(46.3) h3=σ(0.4×120−0.7×1)=σ(46.3)

假设 σ ( 24.2 ) ≈ 0.99 \sigma(24.2) \approx 0.99 σ(24.2)≈0.99, σ ( 37.8 ) ≈ 0.998 \sigma(37.8) \approx 0.998 σ(37.8)≈0.998, σ ( 46.3 ) ≈ 0.999 \sigma(46.3) \approx 0.999 σ(46.3)≈0.999(这里取Sigmoid函数的近似值)

5.隐藏到输出层的权重
  • 权重 W o = [ w 7 , w 8 , w 9 ] = [ 0.1 , 0.2 , 0.3 ] W_o = [w_7, w_8, w_9] = [0.1, 0.2, 0.3] Wo=[w7,w8,w9]=[0.1,0.2,0.3]
6.输出层预测值

y ^ = σ ( w 7 ⋅ h 1 + w 8 ⋅ h 2 + w 9 ⋅ h 3 ) ≈ 0.6452 \hat{y} = \sigma(w_7 \cdot h_1 + w_8 \cdot h_2 + w_9 \cdot h_3)\approx 0.6452 y^=σ(w7⋅h1+w8⋅h2+w9⋅h3)≈0.6452

7.损失计算

L = ( 300 , 000 − y ^ ) 2 ≈ ( 300 , 000 − 0.6452 ) 2 ≈ 89 , 999 , 810 , 000 L = (300,000 - \hat{y})^2 \approx (300,000 - 0.6452 )^2 \approx 89,999,810,000 L=(300,000−y^)2≈(300,000−0.6452)2≈89,999,810,000(这里取 y ^ \hat{y} y^ 的近似值进行计算,这个损失也太大了)

8.梯度计算

(1) d L d y ^ = − 2 ( y − y ^ ) = − 2 ( 300 , 000 − 0.6452 ) ≈ − 599 , 998.7096 \frac{dL}{d\hat{y}}=-2(y-\hat{y})=−2(300,000−0.6452)\approx −599,998.7096 dy^dL=−2(y−y^)=−2(300,000−0.6452)≈−599,998.7096

(2) d y ^ d h 1 / d h 2 / d h 3 \frac{d\hat{y}}{dh_{1}/dh_{2}/dh_{3}} dh1/dh2/dh3dy^

d y ^ d h 1 = ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ⋅ ( 1 − ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ) ⋅ w 7 ) ≈ 0.024 \frac{d\hat{y}}{dh_{1}}=(w_7 h_1 + w_8 h_2 + w_9 h_3)\cdot(1 - (w_7 h_1 + w_8 h_2 + w_9 h_3))\cdot w_7)\approx0.024 dh1dy^=(w7h1+w8h2+w9h3)⋅(1−(w7h1+w8h2+w9h3))⋅w7)≈0.024

d y ^ d h 2 = ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ⋅ ( 1 − ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ) ⋅ w 8 ) ≈ 0.048 \frac{d\hat{y}}{dh_{2}}=(w_7 h_1 + w_8 h_2 + w_9 h_3)\cdot(1 - (w_7 h_1 + w_8 h_2 + w_9 h_3))\cdot w_8)\approx0.048 dh2dy^=(w7h1+w8h2+w9h3)⋅(1−(w7h1+w8h2+w9h3))⋅w8)≈0.048

d y ^ d h 3 = ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ⋅ ( 1 − ( w 7 h 1 + w 8 h 2 + w 9 h 3 ) ) ⋅ w 9 ) ≈ 0.07 \frac{d\hat{y}}{dh_{3}}=(w_7 h_1 + w_8 h_2 + w_9 h_3)\cdot(1 - (w_7 h_1 + w_8 h_2 + w_9 h_3))\cdot w_9)\approx0.07 dh3dy^=(w7h1+w8h2+w9h3)⋅(1−(w7h1+w8h2+w9h3))⋅w9)≈0.07

(3) d h 1 d w 1 \frac{dh_{1}}{dw_{1}} dw1dh1
d h 1 d w 1 = ( w 1 x 1 + b 1 + w 4 x 2 + b 4 ) ⋅ ( 1 − ( w 1 x 1 + b 1 + w 4 x 2 + b 4 ) ) ⋅ x 1 = − 69090 \frac{dh_{1}}{dw_{1}}=(w_{1}x_1 + b_1+ w_{4}x_2 + b_4)\cdot(1 - (w_{1}x_1 + b_1+ w_{4}x_2 + b_4))\cdot x_1=−69090 dw1dh1=(w1x1+b1+w4x2+b4)⋅(1−(w1x1+b1+w4x2+b4))⋅x1=−69090

d L d w 1 = d L d y ^ ⋅ d y ^ d h 1 ⋅ d h 1 d w 1 = − 599 , 998.7096 × 0.024 × ( − 69090 ) = 994893860 \frac{dL}{dw_{1}} = \frac{dL}{d\hat{y}} \cdot \frac{d\hat{y}}{dh_{1}} \cdot \frac{dh_{1}}{dw_{1}}=−599,998.7096\times 0.024\times(−69090)=994893860 dw1dL=dy^dL⋅dh1dy^⋅dw1dh1=−599,998.7096×0.024×(−69090)=994893860

8.梯度更新

这里我们将学习率 α \alpha α设置为0.001。
w 1 n e w = α ⋅ ∂ L ∂ w 1 o l d = 0.001 × 994893860 = 94893.86 w_{1} ^{new}= \alpha \cdot \frac{\partial L}{\partial w_{1}^{old}} =0.001\times994893860 =94893.86 w1new=α⋅∂w1old∂L=0.001×994893860=94893.86

到此为止,我们利用一个样本 [ x 1 , x 2 ] = [ 120 , 1 ] [{x_{1},x_{2}}]=[{120,1}] [x1,x2]=[120,1]和这个简单的神经网络对参数 w 1 w_{1} w1的更新已经完成。

后续步骤就是按照更新 w 1 w_{1} w1的方法对其他参数进行更新,当所有参数都更新完一次后,被称作一次迭代(iteration) 。当你拥有的数据集中的所有数据都输入模型被训练一次,被称作一轮(epoch)。我们的模型训练好需要多个轮次,几百轮或者上千轮。

一个简单的神经网络的迭代过程演示到这里就结束了。

相关推荐
文心快码BaiduComate12 分钟前
百度云与光本位签署战略合作:用AI Agent 重构芯片研发流程
前端·人工智能·架构
风象南1 小时前
Claude Code这个隐藏技能,让我告别PPT焦虑
人工智能·后端
Mintopia2 小时前
OpenClaw 对软件行业产生的影响
人工智能
陈广亮2 小时前
构建具有长期记忆的 AI Agent:从设计模式到生产实践
人工智能
会写代码的柯基犬2 小时前
DeepSeek vs Kimi vs Qwen —— AI 生成俄罗斯方块代码效果横评
人工智能·llm
Mintopia3 小时前
OpenClaw 是什么?为什么节后热度如此之高?
人工智能
爱可生开源社区3 小时前
DBA 的未来?八位行业先锋的年度圆桌讨论
人工智能·dba
叁两6 小时前
用opencode打造全自动公众号写作流水线,AI 代笔太香了!
前端·人工智能·agent
前端付豪6 小时前
LangChain记忆:通过Memory记住上次的对话细节
人工智能·python·langchain