机器学习--神经网络

神经网络

计算

神经网络非常简单,举个例子就理解了(最后一层的那个写错了,应该是 a 1 ( 3 ) a^{(3)}_1 a1(3)):

n o t a t i o n notation notation: a j ( i ) a^{(i)}_j aj(i) 表示第 i i i 层的第 j j j 个单元。 w ( j ) w^{(j)} w(j) 表示权重矩阵,控制从 j j j 层到 j + 1 j + 1 j+1 层的映射。

其中:

a 1 ( 2 ) = g ( w 10 ( 1 ) x 0 + w 11 ( 1 ) x 1 + w 12 ( 1 ) x 2 + w 13 ( 1 ) x 3 ) a 2 ( 2 ) = g ( w 20 ( 1 ) x 0 + w 21 ( 1 ) x 1 + w 22 ( 1 ) x 2 + w 23 ( 1 ) x 3 ) a 3 ( 2 ) = g ( w 30 ( 1 ) x 0 + w 31 ( 1 ) x 1 + w 32 ( 1 ) x 2 + w 33 ( 1 ) x 3 ) h ( x ) = a 1 ( 3 ) = g ( w 10 ( 2 ) a 0 ( 2 ) + w 11 ( 2 ) a 1 ( 2 ) + w 12 ( 2 ) a 2 ( 2 ) + w 13 ( 2 ) a 3 ( 2 ) ) \begin{aligned} a^{(2)}1 = & g\bigg( w^{(1)}{10} x_0 + w^{(1)}{11} x_1 + w^{(1)}{12} x_2 + w^{(1)}{13} x_3 \bigg)\\ a^{(2)}2 = & g\bigg( w^{(1)}{20} x_0 + w^{(1)}{21} x_1 + w^{(1)}{22} x_2 + w^{(1)}{23} x_3 \bigg)\\ a^{(2)}3 = & g\bigg( w^{(1)}{30} x_0 + w^{(1)}{31} x_1 + w^{(1)}{32} x_2 + w^{(1)}_{33} x_3 \bigg)\\ h(x) = a^{(3)}1 = &g\bigg( w^{(2)}{10}a^{(2)}0 + w^{(2)}{11}a^{(2)}1 + w^{(2)}{12}a^{(2)}2 + w^{(2)}{13}a^{(2)}_3 \bigg) \end{aligned} a1(2)=a2(2)=a3(2)=h(x)=a1(3)=g(w10(1)x0+w11(1)x1+w12(1)x2+w13(1)x3)g(w20(1)x0+w21(1)x1+w22(1)x2+w23(1)x3)g(w30(1)x0+w31(1)x1+w32(1)x2+w33(1)x3)g(w10(2)a0(2)+w11(2)a1(2)+w12(2)a2(2)+w13(2)a3(2))

如果向量化一下,那就是:

x = [ x 0 x 1 x 2 x 3 ] ,          w ( 1 ) = [ w 10 ( 1 ) w 11 ( 1 ) w 12 ( 1 ) w 13 ( 1 ) w 20 ( 1 ) w 21 ( 1 ) w 22 ( 1 ) w 23 ( 1 ) w 30 ( 1 ) w 31 ( 1 ) w 32 ( 1 ) w 33 ( 1 ) ] x = \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix}, \;\;\;\; w^{(1)} = \begin{bmatrix} w^{(1)}{10} & w^{(1)}{11} & w^{(1)}{12} & w^{(1)}{13} \\ w^{(1)}{20} & w^{(1)}{21} & w^{(1)}{22} & w^{(1)}{23} \\ w^{(1)}{30} & w^{(1)}{31} & w^{(1)}{32} & w^{(1)}{33} \end{bmatrix} x= x0x1x2x3 ,w(1)= w10(1)w20(1)w30(1)w11(1)w21(1)w31(1)w12(1)w22(1)w32(1)w13(1)w23(1)w33(1)

然后有:

z ( 2 ) = w ( 1 ) x = [ z 1 ( 2 ) z 2 ( 2 ) z 3 ( 2 ) ] ,          a ( 2 ) = g ( z ( 2 ) ) = [ a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] z^{(2)} = w^{(1)}x = \begin{bmatrix} z^{(2)}_1 \\ z^{(2)}_2 \\ z^{(2)}_3 \end{bmatrix}, \;\;\;\;a^{(2)} = g(z^{(2)}) = \begin{bmatrix} a^{(2)}_1 \\ a^{(2)}_2 \\ a^{(2)}_3 \end{bmatrix} z(2)=w(1)x= z1(2)z2(2)z3(2) ,a(2)=g(z(2))= a1(2)a2(2)a3(2)

下一层是:

a ( 2 ) = [ a 0 ( 2 ) a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] ,          w ( 2 ) = [ w 20 ( 2 ) w 21 ( 2 ) w 22 ( 2 ) w 23 ( 2 ) ] a^{(2)} = \begin{bmatrix} a^{(2)}{0} \\ a^{(2)}{1} \\ a^{(2)}{2} \\ a^{(2)}{3} \end{bmatrix}, \;\;\;\;w^{(2)} = \begin{bmatrix} w^{(2)}{20} & w^{(2)}{21} & w^{(2)}{22} & w^{(2)}{23} \end{bmatrix} a(2)= a0(2)a1(2)a2(2)a3(2) ,w(2)=[w20(2)w21(2)w22(2)w23(2)]

z ( 3 ) = w ( 2 ) a ( 2 ) = [ z 1 ( 3 ) ] ,          a ( 3 ) = g ( z ( 3 ) ) = [ a 1 ( 3 ) ] z^{(3)} = w^{(2)}a^{(2)} = \begin{bmatrix} z^{(3)}_1 \end{bmatrix}, \;\;\;\; a^{(3)} = g(z^{(3)}) = \begin{bmatrix} a^{(3)}_1 \end{bmatrix} z(3)=w(2)a(2)=[z1(3)],a(3)=g(z(3))=[a1(3)]

以上就是神经网络的计算方式,其实还是很好理解也很好实现的qwq

后向传播 B a c k    P r o p a g a t i o n Back \; Propagation BackPropagation

现在就是考虑如何计算出 w ( i ) w^{(i)} w(i) 这么多矩阵了。( n o t a t i o n notation notation: L L L 表示神经网络的层数, S l S_l Sl 表示 l l l 层的节点数, k k k 表示输出层的节点数)

我们仍然考虑用类似 G D GD GD 的方法,于是我们考虑 min ⁡ w J ( w ) \min\limits_wJ(w) wminJ(w),其中:

J ( w ) = 1 m ∑ i = 1 m ∑ k = 1 S L 1 2 [ ( h ( x i ) ) k − y i k ] 2 J(w) = \frac 1m \sum_{i = 1}^m\sum_{k = 1}^{S_L}\frac 12 \bigg[ (h(x_i))k - y{ik} \bigg]^2 J(w)=m1i=1∑mk=1∑SL21[(h(xi))k−yik]2

然后我们就是要求解 ∂ J ( w ) ∂ w i j ( l ) \frac{\partial J(w)}{\partial w^{(l)}_{ij}} ∂wij(l)∂J(w)。

我们考虑将所有的训练数据分开求解,对于其中一个训练数据 ( x i , y i ) (x_i, y_i) (xi,yi) 来说:

J i = ∑ k = 1 S L 1 2 [ ( h ( x i ) ) k − y i k ] 2 J_i = \sum_{k = 1}^{S_L}\frac 12 \bigg[ (h(x_i))k - y{ik} \bigg]^2 Ji=k=1∑SL21[(h(xi))k−yik]2

我们定义 δ i ( l ) \delta^{(l)}_i δi(l) 表示 a i ( l ) a^{(l)}_i ai(l) 对真实值的差值,也就是:

δ j ( l ) = ∂ J i ∂ z j ( l ) \delta^{(l)}_j = \frac{\partial J_i}{\partial z^{(l)}_j} δj(l)=∂zj(l)∂Ji

而对于最后一层来说:

δ j ( L ) = ∂ J i ∂ z j ( L ) = ∂ J i ∂ a j ( L ) ⋅ ∂ a j ( L ) ∂ z j ( L ) = ∂ ∑ k = 1 S L 1 2 [ ( h ( x i ) ) k − y i k ] 2 ∂ a j ( L ) ⋅ ∂ g ( z j ( L ) ) ∂ z j ( L ) = ∂ ∑ k = 1 S L 1 2 [ a k ( L ) − y i k ] 2 ∂ a j ( L ) ⋅ g ′ ( z j ( L ) ) = ( a j ( L ) − y i k ) ⋅ g ′ ( z j ( L ) ) \begin{aligned} \delta^{(L)}_j = \frac{\partial J_i}{\partial z^{(L)}_j} = \frac{\partial J_i}{\partial a^{(L)}_j} \cdot \frac{\partial a^{(L)}_j}{\partial z^{(L)}j} = &\frac{\partial \sum\limits{k = 1}^{S_L}\frac 12 [(h(x_i))k - y{ik}]^2}{\partial a^{(L)}_j} \cdot \frac{\partial g(z^{(L)}_j)}{\partial z^{(L)}j} \\ = & \frac{\partial \sum\limits{k = 1}^{S_L}\frac 12 [a^{(L)}k - y{ik}]^2}{\partial a^{(L)}_j} \cdot g'(z^{(L)}_j) = (a^{(L)}j - y{ik}) \cdot g'(z^{(L)}_j) \end{aligned} δj(L)=∂zj(L)∂Ji=∂aj(L)∂Ji⋅∂zj(L)∂aj(L)==∂aj(L)∂k=1∑SL21[(h(xi))k−yik]2⋅∂zj(L)∂g(zj(L))∂aj(L)∂k=1∑SL21[ak(L)−yik]2⋅g′(zj(L))=(aj(L)−yik)⋅g′(zj(L))

而我们要算的是:

∂ J i ∂ w j k ( L − 1 ) = ∂ J i ∂ a j ( L ) ⋅ ∂ a j ( L ) ∂ z j ( L ) ⋅ ∂ z j ( L ) ∂ w j k ( L − 1 ) = δ j ( L ) ⋅ ∂ z j ( L ) ∂ w j k ( L − 1 ) \begin{aligned} \frac{\partial J_i}{\partial w^{(L-1)}_{jk}} = \frac{\partial J_i}{\partial a^{(L)}_j} \cdot \frac{\partial a^{(L)}_j}{\partial z^{(L)}_j} \cdot \frac{\partial z^{(L)}j}{\partial w^{(L-1)}{jk}} = \delta^{(L)}_j \cdot \frac{\partial z^{(L)}j}{\partial w^{(L-1)}{jk}} \end{aligned} ∂wjk(L−1)∂Ji=∂aj(L)∂Ji⋅∂zj(L)∂aj(L)⋅∂wjk(L−1)∂zj(L)=δj(L)⋅∂wjk(L−1)∂zj(L)

所以我们只需要计算 ∂ z j ( L ) ∂ w j k ( L − 1 ) \frac{\partial z^{(L)}j}{\partial w^{(L-1)}{jk}} ∂wjk(L−1)∂zj(L) 就可以了

我们又知道:

z j ( L ) = ∑ i = 1 S L − 1 w j i ( L − 1 ) a i ( L − 1 ) z^{(L)}j = \sum{i = 1}^{S_{L - 1}}w^{(L - 1)}_{ji}a^{(L-1)}_i zj(L)=i=1∑SL−1wji(L−1)ai(L−1)

所以:

∂ z j ( L ) ∂ w j k ( L − 1 ) = ∑ i = 1 S L − 1 ∂ w j i ( L − 1 ) a i ( L − 1 ) ∂ w j k ( L − 1 ) = a k ( L − 1 ) \frac{\partial z^{(L)}j}{\partial w^{(L-1)}{jk}} = \frac{\sum\limits_{i = 1}^{S_{L - 1}}\partial w^{(L-1)}_{ji}a^{(L-1)}i }{\partial w^{(L-1)}{jk}} = a^{(L-1)}_k ∂wjk(L−1)∂zj(L)=∂wjk(L−1)i=1∑SL−1∂wji(L−1)ai(L−1)=ak(L−1)

于是:

∂ J i ∂ w j k ( L − 1 ) = δ j ( L ) ⋅ a k ( L − 1 ) \frac{\partial J_i}{\partial w^{(L-1)}_{jk}} = \delta^{(L)}_j \cdot a^{(L-1)}_k ∂wjk(L−1)∂Ji=δj(L)⋅ak(L−1)

现在我们有了最后一层,我们考虑能不能往前推回去,这里我们以一个简单的例子来更直观的计算(这里我画图时把 w w w 写成 φ \varphi φ 了qwq):

我们假设我们要计算 J J J 对 w 11 ( 3 ) w^{(3)}_{11} w11(3) 求偏导:

∂ J i ∂ w 11 ( 3 ) = ∂ ( J i 1 + J i 2 ) ∂ w 11 ( 3 ) = ∂ J i 1 ∂ w 11 ( 3 ) + ∂ J i 2 ∂ w 11 ( 3 ) \frac{\partial J_i}{\partial w^{(3)}{11}} = \frac{\partial (J{i1} + J_{i2})}{\partial w^{(3)}{11}} = \frac{\partial J{i1}}{\partial w^{(3)}{11}} + \frac{\partial J{i2}}{\partial w^{(3)}_{11}} ∂w11(3)∂Ji=∂w11(3)∂(Ji1+Ji2)=∂w11(3)∂Ji1+∂w11(3)∂Ji2

我们考虑分开求 ∂ J i 1 ∂ w 11 ( 3 ) \frac{\partial J_{i1}}{\partial w^{(3)}{11}} ∂w11(3)∂Ji1 和 ∂ J i 2 ∂ w 11 ( 3 ) \frac{\partial J{i2}}{\partial w^{(3)}_{11}} ∂w11(3)∂Ji2

先算前一项,沿着神经网络做分布求导:

∂ J i 1 ∂ w 11 ( 3 ) = ∂ J i 1 ∂ a 1 ( 5 ) ⋅ ∂ a 1 ( 5 ) ∂ z 1 ( 5 ) ⋅ ∂ z 1 ( 5 ) ∂ a 1 ( 4 ) ⋅ ∂ a 1 ( 4 ) ∂ z 1 ( 4 ) ⋅ ∂ z 1 ( 4 ) w 11 ( 3 ) = δ 1 ( 5 ) ⋅ ∂ z 1 ( 5 ) ∂ a 1 ( 4 ) ⋅ ∂ a 1 ( 4 ) ∂ z 1 ( 4 ) ⋅ ∂ z 1 ( 4 ) w 11 ( 3 ) \begin{aligned} \frac{\partial J_{i1}}{\partial w^{(3)}{11}} = & \frac{\partial J{i1}}{\partial a^{(5)}_1} \cdot \frac{\partial a^{(5)}_1}{\partial z^{(5)}_1} \cdot \frac{\partial z^{(5)}_1}{\partial a^{(4)}_1} \cdot \frac{\partial a^{(4)}_1}{\partial z^{(4)}_1} \cdot \frac{\partial z^{(4)}1}{w^{(3)}{11}} \\ = & \delta^{(5)}_1 \cdot \frac{\partial z^{(5)}_1}{\partial a^{(4)}_1} \cdot \frac{\partial a^{(4)}_1}{\partial z^{(4)}_1} \cdot \frac{\partial z^{(4)}1}{w^{(3)}{11}} \end{aligned} ∂w11(3)∂Ji1==∂a1(5)∂Ji1⋅∂z1(5)∂a1(5)⋅∂a1(4)∂z1(5)⋅∂z1(4)∂a1(4)⋅w11(3)∂z1(4)δ1(5)⋅∂a1(4)∂z1(5)⋅∂z1(4)∂a1(4)⋅w11(3)∂z1(4)

我们又有:

z 1 ( 5 ) = w 11 ( 4 ) a 1 ( 4 ) + w 12 ( 4 ) a 2 ( 4 ) → ∂ z 1 ( 5 ) ∂ a 1 ( 4 ) = w 11 ( 4 ) a 1 ( 4 ) = g ( z 1 ( 4 ) ) → ∂ a 1 ( 4 ) ∂ z 1 ( 4 ) = g ′ ( z 1 ( 4 ) ) z 1 ( 4 ) = w 11 ( 3 ) a 1 ( 3 ) + w 12 ( 3 ) a 2 ( 3 ) → ∂ z 1 ( 4 ) ∂ w 11 ( 3 ) = a 1 ( 3 ) \begin{aligned} z^{(5)}1 = w^{(4)}{11}a^{(4)}1 + w^{(4)}{12}a^{(4)}_2 \rightarrow & \frac{\partial z^{(5)}_1}{\partial a^{(4)}1} = w^{(4)}{11} \\ a^{(4)}_1 = g(z^{(4)}_1) \rightarrow & \frac{\partial a^{(4)}_1}{\partial z^{(4)}_1} = g'(z^{(4)}_1) \\ z^{(4)}1 = w^{(3)}{11}a^{(3)}1 + w^{(3)}{12}a^{(3)}_2 \rightarrow & \frac{\partial z^{(4)}1}{\partial w^{(3)}{11}} = a^{(3)}_1 \end{aligned} z1(5)=w11(4)a1(4)+w12(4)a2(4)→a1(4)=g(z1(4))→z1(4)=w11(3)a1(3)+w12(3)a2(3)→∂a1(4)∂z1(5)=w11(4)∂z1(4)∂a1(4)=g′(z1(4))∂w11(3)∂z1(4)=a1(3)

所以:

∂ J i 1 ∂ w 11 ( 3 ) = δ 1 ( 5 ) ⋅ w 11 ( 4 ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) \frac{\partial J_{i1}}{\partial w^{(3)}_{11}} = \delta^{(5)}1 \cdot w^{(4)}{11} \cdot g'(z^{(4)}_1) \cdot a^{(3)}_1 ∂w11(3)∂Ji1=δ1(5)⋅w11(4)⋅g′(z1(4))⋅a1(3)

同样的,我们也可以推出(这里因为和前面几乎一样所以过程就省略了 (绝对不是因为公式打起来太麻烦了qwq):

∂ J i 2 ∂ w 11 ( 3 ) = δ 2 ( 5 ) ⋅ w 21 ( 4 ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) \frac{\partial J_{i2}}{\partial w^{(3)}_{11}} = \delta^{(5)}2 \cdot w^{(4)}{21} \cdot g'(z^{(4)}_1) \cdot a^{(3)}_1 ∂w11(3)∂Ji2=δ2(5)⋅w21(4)⋅g′(z1(4))⋅a1(3)

所以把这俩玩意儿加起来就能得到:

∂ J i ∂ w 11 ( 3 ) = δ 1 ( 5 ) ⋅ w 11 ( 4 ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) + δ 2 ( 5 ) ⋅ w 21 ( 4 ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) = ( δ 1 ( 5 ) ⋅ w 11 ( 4 ) + δ 2 ( 5 ) ⋅ w 21 ( 4 ) ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) \begin{aligned} \frac{\partial J_i}{\partial w^{(3)}_{11}} = & \delta^{(5)}1 \cdot w^{(4)}{11} \cdot g'(z^{(4)}_1) \cdot a^{(3)}_1 + \delta^{(5)}2 \cdot w^{(4)}{21} \cdot g'(z^{(4)}_1) \cdot a^{(3)}_1\\ = & (\delta^{(5)}1 \cdot w^{(4)}{11} + \delta^{(5)}2 \cdot w^{(4)}{21})\cdot g'(z^{(4)}_1) \cdot a^{(3)}_1 \end{aligned} ∂w11(3)∂Ji==δ1(5)⋅w11(4)⋅g′(z1(4))⋅a1(3)+δ2(5)⋅w21(4)⋅g′(z1(4))⋅a1(3)(δ1(5)⋅w11(4)+δ2(5)⋅w21(4))⋅g′(z1(4))⋅a1(3)

然后我们令:

δ 1 ( 4 ) = ( δ 1 ( 5 ) ⋅ w 11 ( 4 ) + δ 2 ( 5 ) ⋅ w 21 ( 4 ) ) ⋅ g ′ ( z 1 ( 4 ) ) \delta^{(4)}_1 = (\delta^{(5)}1 \cdot w^{(4)}{11} + \delta^{(5)}2 \cdot w^{(4)}{21}) \cdot g'(z^{(4)}_1) δ1(4)=(δ1(5)⋅w11(4)+δ2(5)⋅w21(4))⋅g′(z1(4))

于是我们就有:

∂ J i ∂ w 11 ( 3 ) = δ 1 ( 4 ) ⋅ a 1 ( 3 ) \frac{\partial J_i}{\partial w^{(3)}_{11}} = \delta^{(4)}_1 \cdot a^{(3)}_1 ∂w11(3)∂Ji=δ1(4)⋅a1(3)

我们发现,这个式子跟我们上面的

∂ J i ∂ w j k ( L − 1 ) = δ j ( L ) ⋅ a k ( L − 1 ) \frac{\partial J_i}{\partial w^{(L-1)}_{jk}} = \delta^{(L)}_j \cdot a^{(L-1)}_k ∂wjk(L−1)∂Ji=δj(L)⋅ak(L−1)

这个的结构完全一致。

所以我们得到了一个递推式:

δ 1 ( 4 ) = ( δ 1 ( 5 ) ⋅ w 11 ( 4 ) + δ 2 ( 5 ) ⋅ w 21 ( 4 ) ) ⋅ g ′ ( z 1 ( 4 ) ) \delta^{(4)}_1 = (\delta^{(5)}1 \cdot w^{(4)}{11} + \delta^{(5)}2 \cdot w^{(4)}{21}) \cdot g'(z^{(4)}_1) δ1(4)=(δ1(5)⋅w11(4)+δ2(5)⋅w21(4))⋅g′(z1(4))

同样的,我们也能得到:

δ 2 ( 4 ) = ( δ 1 ( 5 ) ⋅ w 12 ( 4 ) + δ 2 ( 5 ) ⋅ w 22 ( 4 ) ) ⋅ g ′ ( z 2 ( 4 ) ) \delta^{(4)}_2 = (\delta^{(5)}1 \cdot w^{(4)}{12} + \delta^{(5)}2 \cdot w^{(4)}{22}) \cdot g'(z^{(4)}_2) δ2(4)=(δ1(5)⋅w12(4)+δ2(5)⋅w22(4))⋅g′(z2(4))

也可以写成向量的形式:

[ δ 1 ( 4 ) δ 2 ( 4 ) ] = ( [ w 11 ( 4 ) w 12 ( 4 ) w 21 ( 4 ) w 22 ( 4 ) ] [ δ 1 ( 5 ) δ 2 ( 5 ) ] ) ⋅ ∗ [ g ′ ( z 1 ( 4 ) ) g ′ ( z 2 ( 4 ) ) ] \begin{bmatrix} \delta^{(4)}1 \\ \delta^{(4)}2 \end{bmatrix} = \left(\begin{bmatrix} w^{(4)}{11} & w^{(4)}{12} \\ w^{(4)}{21} & w^{(4)}{22} \end{bmatrix} \begin{bmatrix} \delta^{(5)}_1 \\ \delta^{(5)}_2 \end{bmatrix}\right) \cdot* \begin{bmatrix} g'(z^{(4)}_1) \\ g'(z^{(4)}_2) \end{bmatrix} [δ1(4)δ2(4)]=([w11(4)w21(4)w12(4)w22(4)][δ1(5)δ2(5)])⋅∗[g′(z1(4))g′(z2(4))]

也就是:

δ ( 4 ) = [ ( w ( 4 ) ) T δ ( 5 ) ] ⋅ ∗ g ′ ( z ( 4 ) ) \delta^{(4)} = \bigg[(w^{(4)})^T\delta^{(5)}\bigg] \cdot* g'(z^{(4)}) δ(4)=[(w(4))Tδ(5)]⋅∗g′(z(4))

同样的,我们也能将这个式子推广到其他层:

δ ( l ) = [ ( w ( l ) ) T δ ( l + 1 ) ] ⋅ ∗ g ′ ( z ( l ) ) \delta^{(l)} = \bigg[ (w^{(l)})^T\delta^{(l+1)} \bigg] \cdot* g'(z^{(l)}) δ(l)=[(w(l))Tδ(l+1)]⋅∗g′(z(l))

这个式子就是我们 b a c k    p r o p a g a t i o n back \; propagation backpropagation 的关键了。

然后我们对于每个训练数据 i i i 都跑一遍 B P BP BP 计算出 ∂ J i ∂ w j k ( L − 1 ) \frac{\partial J_i}{\partial w^{(L-1)}{jk}} ∂wjk(L−1)∂Ji,然后令 Δ j k ( l ) \Delta^{(l)}{jk} Δjk(l) 存储 ∂ J i ∂ w j k ( L − 1 ) \frac{\partial J_i}{\partial w^{(L-1)}{jk}} ∂wjk(L−1)∂Ji 的和。最后跑完 m m m 个训练数据后令 D j k ( l ) = 1 m Δ j k ( l ) D^{(l)}{jk} = \frac 1m\Delta^{(l)}_{jk} Djk(l)=m1Δjk(l),我们就得到了:

∂ ∂ w j k ( l ) J ( w ) = D j k ( l ) \frac{\partial}{\partial w^{(l)}{jk}}J(w) = D^{(l)}{jk} ∂wjk(l)∂J(w)=Djk(l)

然后再进行 G D GD GD 就可以了。

相关推荐
不去幼儿园18 分钟前
【RL Base】多级反馈队列(MFQ)算法
人工智能·python·算法·机器学习·强化学习
土豆炒马铃薯。20 分钟前
CUDA,PyTorch,GCC 之间的版本关系
linux·c++·人工智能·pytorch·python·深度学习·opencv
符小易22 分钟前
Mac M4苹果电脑M4上支持的AE/PR/PS/AI/ID/LrC/AU/DC/ME有哪些?
人工智能·macos
视言36 分钟前
大模型时代的具身智能系列专题(十二)
人工智能·深度学习·计算机视觉·机器人·具身智能
Alone--阮泽宇36 分钟前
【机器学习】—PCA(主成分分析)
人工智能·机器学习
pblh12341 分钟前
spark 3.4.4 机器学习基于逻辑回归算法及管道流实现鸢尾花分类预测案例
机器学习·回归·spark
爱研究的小牛1 小时前
Stable Diffusion初步见解(二)
人工智能·stable diffusion
肖遥Janic1 小时前
ComfyUI绘画|Stable Diffusion原理的简易讲解
人工智能·ai·ai作画·stable diffusion
克莱因蓝1271 小时前
在复现SDXL-Turbo和stable-diffusion-2-1系列大模型过程中遇到的问题以及解决方案
人工智能·学习·stable diffusion
Easy数模1 小时前
因果机器学习EconML | 客户细分案例——基于机器学习的异质性处理效果估计
人工智能·机器学习