神经网络池化层梯度公式推导

目录


前置:神经网络卷积层梯度公式推导
代码封装:神经网络常见层Numpy封装参考(6):卷积层

平均池化层

数据结构

假设输入图像数据已通过池化层处理并完成填充,其结构如下:
X ( H × W ) = x 1 ( 1 ) x 1 ( 2 ) ⋯ x 1 ( W ) x 2 ( 1 ) x 2 ( 2 ) ⋯ x 2 ( W ) ⋯ ⋯ ⋯ ⋯ x H ( 1 ) x H ( 2 ) ⋯ x H ( W ) {{\bf{X}}_{(H \times W)}} = \left {\\begin{array}{c} {{x_1}\^{(1)}}\&{{x_1}\^{(2)}}\& \\cdots \&{{x_1}\^{(W)}}\\\\ {{x_2}\^{(1)}}\&{{x_2}\^{(2)}}\& \\cdots \&{{x_2}\^{(W)}}\\\\ \\cdots \& \\cdots \& \\cdots \& \\cdots \\\\ {{x_H}\^{(1)}}\&{{x_H}\^{(2)}}\& \\cdots \&{{x_H}\^{(W)}} \\end{array}} \\right X(H×W)= x1(1)x2(1)⋯xH(1)x1(2)x2(2)⋯xH(2)⋯⋯⋯⋯x1(W)x2(W)⋯xH(W)

通过如下公式计算输出形状:
H o u t = ⌊ H + 2 × p a d d i n g − H k s t r i d e ⌋ + 1 W o u t = ⌊ W + 2 × p a d d i n g − W k s t r i d e ⌋ + 1 \begin{array}{l} {H_{out}} = \left\lfloor {\frac{{H + 2 \times {\rm{padding}} - {H_k}}}{{{\rm{stride}}}}} \right\rfloor + 1\\ {W_{out}} = \left\lfloor {\frac{{W + 2 \times {\rm{padding}} - {W_k}}}{{{\rm{stride}}}}} \right\rfloor + 1 \end{array} Hout=⌊strideH+2×padding−Hk⌋+1Wout=⌊strideW+2×padding−Wk⌋+1

X ( H × W ) {{\bf{X}}{(H \times W)}} X(H×W)经过im2col算法转化生成的col矩阵形状如下:
X c o l ( H o u t W o u t × H k W k ) = x \^ 1 ( 1 ) x \^ 1 ( 2 ) ⋯ x \^ 1 ( H k W k ) x \^ 2 ( 1 ) x \^ 2 ( 2 ) ⋯ x \^ 2 ( H k W k ) ⋯ ⋯ ⋯ ⋯ x \^ H o u t W o u t ( 1 ) x \^ H o u t W o u t ( 2 ) ⋯ x \^ H o u t W o u t ( H k W k ) \mathop {{{\bf{X}}
{{\bf{col}}}}}\limits^{({H_{out}}{W_{out}} \times {H_k}{W_k})} = \left {\\begin{array}{c} {{{\\hat x}_1}\^{(1)}}\&{{{\\hat x}_1}\^{(2)}}\& \\cdots \&{{{\\hat x}_1}\^{({H_k}{W_k})}}\\\\ {{{\\hat x}_2}\^{(1)}}\&{{{\\hat x}_2}\^{(2)}}\& \\cdots \&{{{\\hat x}_2}\^{({H_k}{W_k})}}\\\\ \\cdots \& \\cdots \& \\cdots \& \\cdots \\\\ {{{\\hat x}_{{H_{out}}{W_{out}}}}\^{(1)}}\&{{{\\hat x}_{{H_{out}}{W_{out}}}}\^{(2)}}\& \\cdots \&{{{\\hat x}_{{H_{out}}{W_{out}}}}\^{({H_k}{W_k})}} \\end{array}} \\right Xcol(HoutWout×HkWk)= x^1(1)x^2(1)⋯x^HoutWout(1)x^1(2)x^2(2)⋯x^HoutWout(2)⋯⋯⋯⋯x^1(HkWk)x^2(HkWk)⋯x^HoutWout(HkWk)


前向传播过程

直接将 X c o l ( H o u t W o u t × H k W k ) \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} Xcol(HoutWout×HkWk)矩阵的最后一个维度求和即可(输出时要将 Y f l a t ( H o u t W o u t × 1 ) \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} Yflat(HoutWout×1)变形至 H o u t × W o u t {H_{out}} \times {W_{out}} Hout×Wout):
Y f l a t ( H o u t W o u t × 1 ) = X c o l ( H o u t W o u t × H k W k ) . m e a n ( a x i s = − 1 ) = 1 H k W k × ∑ i H k W k x \^ 1 ( i ) ∑ i H k W k x \^ 2 ( i ) ⋯ ∑ i H k W k x \^ H o u t W o u t ( i ) = y 1 y 2 ⋯ y H o u t W o u t , y H o u t W o u t = 1 H k W k × ∑ i H k W k x ^ H o u t W o u t ( i ) \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} = \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} .{\mathop{\rm mean}\nolimits} \left( {axis = - 1} \right) = \frac{1}{{{H_k}{W_k}}} \times \left {\\begin{array}{c} {\\sum\\nolimits_i\^{{H_k}{W_k}} {{{\\hat x}_1}\^{(i)}} }\\\\ {\\sum\\nolimits_i\^{{H_k}{W_k}} {{{\\hat x}_2}\^{(i)}} }\\\\ \\cdots \\\\ {\\sum\\nolimits_i\^{{H_k}{W_k}} {{{\\hat x}_{{H_{out}}{W_{out}}}}\^{(i)}} } \\end{array}} \\right = \left {\\begin{array}{c} {{y_1}}\\\\ {{y_2}}\\\\ \\cdots \\\\ {{y_{{H_{out}}{W_{out}}}}} \\end{array}} \\right,{y_{{H_{out}}{W_{out}}}} = \frac{1}{{{H_k}{W_k}}} \times \sum\nolimits_i^{{H_k}{W_k}} {{{\hat x}{{H{out}}{W_{out}}}}^{(i)}} Yflat(HoutWout×1)=Xcol(HoutWout×HkWk).mean(axis=−1)=HkWk1× ∑iHkWkx^1(i)∑iHkWkx^2(i)⋯∑iHkWkx^HoutWout(i) = y1y2⋯yHoutWout ,yHoutWout=HkWk1×∑iHkWkx^HoutWout(i)

产生损失:
l o s s = L ( Y f l a t ( H o u t W o u t × 1 ) ) loss = L\left( {\mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} } \right) loss=L(Yflat(HoutWout×1))

传入该层的梯度结构如下:
G r a d f l a t ( H o u t W o u t × 1 ) = ∂ L ( Y f l a t ( H o u t W o u t × 1 ) ) ∂ Y f l a t ( H o u t W o u t × 1 ) = ∂ L ( Y f l a t ( H o u t W o u t × 1 ) ) ∂ y 1 ∂ L ( Y f l a t ( H o u t W o u t × 1 ) ) ∂ y 2 ⋯ ∂ L ( Y f l a t ( H o u t W o u t × 1 ) ) ∂ y H o u t W o u t = g 1 g 2 ⋯ g H o u t W o u t , g H o u t W o u t = ∂ L ( Y f l a t ( H o u t W o u t × 1 ) ) ∂ y H o u t W o u t \mathop {{\bf{Gra}}{{\bf{d}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} = \frac{{\partial L\left( {\mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} } \right)}}{{\partial \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} }} = \left {\\begin{array}{c} {\\frac{{\\partial L\\left( {\\mathop {{{\\bf{Y}}_{{\\bf{flat}}}}}\\limits\^{({H_{out}}{W_{out}} \\times 1)} } \\right)}}{{\\partial {y_1}}}}\\\\ {\\frac{{\\partial L\\left( {\\mathop {{{\\bf{Y}}_{{\\bf{flat}}}}}\\limits\^{({H_{out}}{W_{out}} \\times 1)} } \\right)}}{{\\partial {y_2}}}}\\\\ \\cdots \\\\ {\\frac{{\\partial L\\left( {\\mathop {{{\\bf{Y}}_{{\\bf{flat}}}}}\\limits\^{({H_{out}}{W_{out}} \\times 1)} } \\right)}}{{\\partial {y_{{H_{out}}{W_{out}}}}}}} \\end{array}} \\right = \left {\\begin{array}{c} {{g_1}}\\\\ {{g_2}}\\\\ \\cdots \\\\ {{g_{{H_{out}}{W_{out}}}}} \\end{array}} \\right,{g_{{H_{out}}{W_{out}}}} = \frac{{\partial L\left( {\mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} } \right)}}{{\partial {y_{{H_{out}}{W_{out}}}}}} Gradflat(HoutWout×1)=∂Yflat(HoutWout×1)∂L(Yflat(HoutWout×1))= ∂y1∂L(Yflat(HoutWout×1))∂y2∂L(Yflat(HoutWout×1))⋯∂yHoutWout∂L(Yflat(HoutWout×1)) = g1g2⋯gHoutWout ,gHoutWout=∂yHoutWout∂L(Yflat(HoutWout×1))


反向传播过程

我们直接对损失以输入数据的最后一项元素求偏导,即输出梯度的通项公式:
∂ l o s s ∂ x ^ H o u t W o u t ( H k W k ) = ∂ L ( Y f l a t ( H o u t W o u t × 1 ) ) ∂ x ^ H o u t W o u t ( H k W k ) = g H o u t W o u t ⋅ ∂ 1 H k W k × ∑ i H k W k x ^ H o u t W o u t ( i ) ∂ x ^ H o u t W o u t ( H k W k ) = 1 H k W k × g H o u t W o u t \frac{{\partial loss}}{{\partial {{\hat x}{{H{out}}{W_{out}}}}^{({H_k}{W_k})}}} = \frac{{\partial L\left( {\mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} } \right)}}{{\partial {{\hat x}{{H{out}}{W_{out}}}}^{({H_k}{W_k})}}} = {g_{{H_{out}}{W_{out}}}} \cdot \frac{{\partial \frac{1}{{{H_k}{W_k}}} \times \sum\nolimits_i^{{H_k}{W_k}} {{{\hat x}{{H{out}}{W_{out}}}}^{(i)}} }}{{\partial {{\hat x}{{H{out}}{W_{out}}}}^{({H_k}{W_k})}}} = \frac{1}{{{H_k}{W_k}}} \times {g_{{H_{out}}{W_{out}}}} ∂x^HoutWout(HkWk)∂loss=∂x^HoutWout(HkWk)∂L(Yflat(HoutWout×1))=gHoutWout⋅∂x^HoutWout(HkWk)∂HkWk1×∑iHkWkx^HoutWout(i)=HkWk1×gHoutWout

合并为完整的梯度矩阵:
d X c o l ( H o u t W o u t × H k W k ) = ∂ l o s s ∂ X c o l ( H o u t W o u t × H k W k ) = 1 H k W k × g 1 g 1 ⋯ g 1 g 2 g 2 ⋯ g 2 ⋯ ⋯ ⋯ ⋯ g H o u t W o u t g H o u t W o u t ⋯ g H o u t W o u t = 1 H k W k × ( 0 ( H o u t W o u t × H k W k ) + G r a d f l a t ( H o u t W o u t × 1 ) ) \mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} = \frac{{\partial loss}}{{\partial \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} }} = \frac{1}{{{H_k}{W_k}}} \times \left {\\begin{array}{c} {{g_1}}\&{{g_1}}\& \\cdots \&{{g_1}}\\\\ {{g_2}}\&{{g_2}}\& \\cdots \&{{g_2}}\\\\ \\cdots \& \\cdots \& \\cdots \& \\cdots \\\\ {{g_{{H_{out}}{W_{out}}}}}\&{{g_{{H_{out}}{W_{out}}}}}\& \\cdots \&{{g_{{H_{out}}{W_{out}}}}} \\end{array}} \\right = \frac{1}{{{H_k}{W_k}}} \times \left( {{{\bf{0}}{({H{out}}{W_{out}} \times {H_k}{W_k})}} + \mathop {{\bf{Gra}}{{\bf{d}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} } \right) dXcol(HoutWout×HkWk)=∂Xcol(HoutWout×HkWk)∂loss=HkWk1× g1g2⋯gHoutWoutg1g2⋯gHoutWout⋯⋯⋯⋯g1g2⋯gHoutWout =HkWk1×(0(HoutWout×HkWk)+Gradflat(HoutWout×1))

显然要想得到 d X c o l ( H o u t W o u t × H k W k ) \mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} dXcol(HoutWout×HkWk),只需将输入梯度 G r a d ( H o u t × W o u t ) {\bf{Gra}}{{\bf{d}}{({H{out}} \times {W_{out}})}} Grad(Hout×Wout)展平为 G r a d f l a t ( H o u t W o u t × 1 ) \mathop {{\bf{Gra}}{{\bf{d}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} Gradflat(HoutWout×1),然后通过广播形成 G R A D ( H o u t W o u t × H k W k ) {\bf{GRA}}{{\bf{D}}{({H{out}}{W_{out}} \times {H_k}{W_k})}} GRAD(HoutWout×HkWk),最后除以 H k × W k {H_k} \times {W_k} Hk×Wk即可。

最终,将 d X c o l ( H o u t W o u t × H k W k ) \mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} dXcol(HoutWout×HkWk)通过col2im算法还原并累加,即可得到输出梯度:
G R A D ′ ( H × W ) = ∑ c o l 2 i m ( d X c o l ( H o u t W o u t × H k W k ) ) {\bf{GRAD}}{{\bf{'}}{(H \times W)}} = \sum {{\mathop{\rm col2im}\nolimits} \left( {\mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{({H_{out}}{W_{out}} \times {H_k}{W_k})} } \right)} GRAD′(H×W)=∑col2im(dXcol(HoutWout×HkWk))

拓展为批数据,公式不变:
Y f l a t ( N × C × H o u t W o u t × 1 ) = X c o l ( N × C × H o u t W o u t × H k W k ) . m e a n ( a x i s = − 1 ) d X c o l ( N × C × H o u t W o u t × H k W k ) = 1 H k W k × ( 0 ( N × C × H o u t W o u t × H k W k ) + G r a d f l a t ( N × C × H o u t W o u t × 1 ) ) G R A D ′ ( N × C × H × W ) = ∑ c o l 2 i m ( d X c o l ( N × C × H o u t W o u t × H k W k ) ) \begin{array}{l} \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times 1)} = \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times {H_k}{W_k})} .{\mathop{\rm mean}\nolimits} \left( {axis = - 1} \right)\\ \mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times {H_k}{W_k})} = \frac{1}{{{H_k}{W_k}}} \times \left( {{{\bf{0}}{(N \times C \times {H{out}}{W_{out}} \times {H_k}{W_k})}} + \mathop {{\bf{Gra}}{{\bf{d}}{{\bf{flat}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times 1)} } \right)\\ {\bf{GRAD}}{{\bf{'}}{(N \times C \times H \times W)}} = \sum {{\mathop{\rm col}\nolimits} 2im\left( {\mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{(N \times C \times {H_{out}}{W_{out}} \times {H_k}{W_k})} } \right)} \end{array} Yflat(N×C×HoutWout×1)=Xcol(N×C×HoutWout×HkWk).mean(axis=−1)dXcol(N×C×HoutWout×HkWk)=HkWk1×(0(N×C×HoutWout×HkWk)+Gradflat(N×C×HoutWout×1))GRAD′(N×C×H×W)=∑col2im(dXcol(N×C×HoutWout×HkWk))


最大池化层

前向传播过程

类比均值池化层,最大池化层的前向传播公式变成了:
Y f l a t ( H o u t W o u t × 1 ) = X c o l ( H o u t W o u t × H k W k ) . m a x ( a x i s = − 1 ) = max ⁡ ( x \^ 1 ( 1 ) , x \^ 1 ( 2 ) , ⋯   , x \^ 1 ( H k W k ) ) max ⁡ ( x \^ 2 ( 1 ) , x \^ 2 ( 2 ) , ⋯   , x \^ 2 ( H k W k ) ) ⋯ max ⁡ ( x \^ H o u t W o u t ( 1 ) , x \^ H o u t W o u t ( 2 ) , ⋯   , x \^ H o u t W o u t ( H k W k ) ) = y 1 y 2 ⋯ y H o u t W o u t \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} = \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})}{\rm .max\left( {axis = - 1} \right)} = \left {\\begin{array}{c} {\\max \\left( {{{\\hat x}_1}\^{(1)},{{\\hat x}_1}\^{(2)}, \\cdots ,{{\\hat x}_1}\^{({H_k}{W_k})}} \\right)}\\\\ {\\max \\left( {{{\\hat x}_2}\^{(1)},{{\\hat x}_2}\^{(2)}, \\cdots ,{{\\hat x}_2}\^{({H_k}{W_k})}} \\right)}\\\\ \\cdots \\\\ {\\max \\left( {{{\\hat x}_{{H_{out}}{W_{out}}}}\^{(1)},{{\\hat x}_{{H_{out}}{W_{out}}}}\^{(2)}, \\cdots ,{{\\hat x}_{{H_{out}}{W_{out}}}}\^{({H_k}{W_k})}} \\right)} \\end{array}} \\right = \left {\\begin{array}{c} {{y_1}}\\\\ {{y_2}}\\\\ \\cdots \\\\ {{y_{{H_{out}}{W_{out}}}}} \\end{array}} \\right Yflat(HoutWout×1)=Xcol(HoutWout×HkWk).max(axis=−1)= max(x^1(1),x^1(2),⋯,x^1(HkWk))max(x^2(1),x^2(2),⋯,x^2(HkWk))⋯max(x^HoutWout(1),x^HoutWout(2),⋯,x^HoutWout(HkWk)) = y1y2⋯yHoutWout

对每一个输出项:
y H o u t W o u t = max ⁡ ( x ^ H o u t W o u t ( 1 ) , x ^ H o u t W o u t ( 2 ) , ⋯   , x ^ H o u t W o u t ( H k W k ) ) {y_{{H_{out}}{W_{out}}}} = \max \left( {{{\hat x}{{H{out}}{W_{out}}}}^{(1)},{{\hat x}{{H{out}}{W_{out}}}}^{(2)}, \cdots ,{{\hat x}{{H{out}}{W_{out}}}}^{({H_k}{W_k})}} \right) yHoutWout=max(x^HoutWout(1),x^HoutWout(2),⋯,x^HoutWout(HkWk))


反向传播过程

同理,我们对损失以最后一项输入数据求偏导:
∂ l o s s ∂ x ^ H o u t W o u t ( H k W k ) = ∂ L ( Y f l a t ( H o u t W o u t × 1 ) ) ∂ x ^ H o u t W o u t ( H k W k ) = g H o u t W o u t ⋅ ∂ y H o u t W o u t ∂ x ^ H o u t W o u t ( H k W k ) \frac{{\partial loss}}{{\partial {{\hat x}{{H{out}}{W_{out}}}}^{({H_k}{W_k})}}} = \frac{{\partial L\left( {\mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} } \right)}}{{\partial {{\hat x}{{H{out}}{W_{out}}}}^{({H_k}{W_k})}}} = {g_{{H_{out}}{W_{out}}}} \cdot \frac{{\partial {y_{{H_{out}}{W_{out}}}}}}{{\partial {{\hat x}{{H{out}}{W_{out}}}}^{({H_k}{W_k})}}} ∂x^HoutWout(HkWk)∂loss=∂x^HoutWout(HkWk)∂L(Yflat(HoutWout×1))=gHoutWout⋅∂x^HoutWout(HkWk)∂yHoutWout

整合:
d X c o l ( H o u t W o u t × H k W k ) = ∂ l o s s ∂ X c o l ( H o u t W o u t × H k W k ) = g 1 ⋅ ∂ y 1 ∂ x \^ 1 ( 1 ) g 1 ⋅ ∂ y 1 ∂ x \^ 1 ( 2 ) ⋯ g 1 ⋅ ∂ y 1 ∂ x \^ 1 ( H k W k ) g 2 ⋅ ∂ y 2 ∂ x \^ 2 ( 1 ) g 2 ⋅ ∂ y 2 ∂ x \^ 2 ( 2 ) ⋯ g 2 ⋅ ∂ y 2 ∂ x \^ 2 ( H k W k ) ⋯ ⋯ ⋯ ⋯ g H o u t W o u t ⋅ ∂ y H o u t W o u t ∂ x \^ H o u t W o u t ( 1 ) g H o u t W o u t ⋅ ∂ y H o u t W o u t ∂ x \^ H o u t W o u t ( 2 ) ⋯ g H o u t W o u t ⋅ ∂ y H o u t W o u t ∂ x \^ H o u t W o u t ( H k W k ) = G r a d f l a t ( H o u t W o u t × 1 ) × ∂ Y f l a t ( H o u t W o u t × 1 ) ∂ X c o l ( H o u t W o u t × H k W k ) \mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} = \frac{{\partial loss}}{{\partial \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} }} = \left {\\begin{array}{c} {{g_1} \\cdot \\frac{{\\partial {y_1}}}{{\\partial {{\\hat x}_1}\^{(1)}}}}\&{{g_1} \\cdot \\frac{{\\partial {y_1}}}{{\\partial {{\\hat x}_1}\^{(2)}}}}\& \\cdots \&{{g_1} \\cdot \\frac{{\\partial {y_1}}}{{\\partial {{\\hat x}_1}\^{({H_k}{W_k})}}}}\\\\ {{g_2} \\cdot \\frac{{\\partial {y_2}}}{{\\partial {{\\hat x}_2}\^{(1)}}}}\&{{g_2} \\cdot \\frac{{\\partial {y_2}}}{{\\partial {{\\hat x}_2}\^{(2)}}}}\& \\cdots \&{{g_2} \\cdot \\frac{{\\partial {y_2}}}{{\\partial {{\\hat x}_2}\^{({H_k}{W_k})}}}}\\\\ \\cdots \& \\cdots \& \\cdots \& \\cdots \\\\ {{g_{{H_{out}}{W_{out}}}} \\cdot \\frac{{\\partial {y_{{H_{out}}{W_{out}}}}}}{{\\partial {{\\hat x}_{{H_{out}}{W_{out}}}}\^{(1)}}}}\&{{g_{{H_{out}}{W_{out}}}} \\cdot \\frac{{\\partial {y_{{H_{out}}{W_{out}}}}}}{{\\partial {{\\hat x}_{{H_{out}}{W_{out}}}}\^{(2)}}}}\& \\cdots \&{{g_{{H_{out}}{W_{out}}}} \\cdot \\frac{{\\partial {y_{{H_{out}}{W_{out}}}}}}{{\\partial {{\\hat x}_{{H_{out}}{W_{out}}}}\^{({H_k}{W_k})}}}} \\end{array}} \\right = \mathop {{\bf{Gra}}{{\bf{d}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} \times \frac{{\partial \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} }}{{\partial \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} }} dXcol(HoutWout×HkWk)=∂Xcol(HoutWout×HkWk)∂loss= g1⋅∂x^1(1)∂y1g2⋅∂x^2(1)∂y2⋯gHoutWout⋅∂x^HoutWout(1)∂yHoutWoutg1⋅∂x^1(2)∂y1g2⋅∂x^2(2)∂y2⋯gHoutWout⋅∂x^HoutWout(2)∂yHoutWout⋯⋯⋯⋯g1⋅∂x^1(HkWk)∂y1g2⋅∂x^2(HkWk)∂y2⋯gHoutWout⋅∂x^HoutWout(HkWk)∂yHoutWout =Gradflat(HoutWout×1)×∂Xcol(HoutWout×HkWk)∂Yflat(HoutWout×1)

显然式 ∂ Y f l a t ( H o u t W o u t × 1 ) ∂ X c o l ( H o u t W o u t × H k W k ) \frac{{\partial \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} }}{{\partial \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} }} ∂Xcol(HoutWout×HkWk)∂Yflat(HoutWout×1)是一个雅可比矩阵,即:
∂ Y f l a t ( H o u t W o u t × 1 ) ∂ X c o l ( H o u t W o u t × H k W k ) = ∂ y 1 ∂ x \^ 1 ( 1 ) ∂ y 1 ∂ x \^ 1 ( 2 ) ⋯ ∂ y 1 ∂ x \^ 1 ( H k W k ) ∂ y 2 ∂ x \^ 2 ( 1 ) ∂ y 2 ∂ x \^ 2 ( 2 ) ⋯ ∂ y 2 ∂ x \^ 2 ( H k W k ) ⋯ ⋯ ⋯ ⋯ ∂ y H o u t W o u t ∂ x \^ H o u t W o u t ( 1 ) ∂ y H o u t W o u t ∂ x \^ H o u t W o u t ( 2 ) ⋯ ∂ y H o u t W o u t ∂ x \^ H o u t W o u t ( H k W k ) ( H o u t W o u t × H k W k ) = X c o l ( H o u t W o u t × H k W k ) . a r g m a x ( a x i s = − 1 ) . e y e ( a x i s = − 1 ) \frac{{\partial \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{({H{out}}{W_{out}} \times 1)} }}{{\partial \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} }} = {\left {\\begin{array}{c} {\\frac{{\\partial {y_1}}}{{\\partial {{\\hat x}_1}\^{(1)}}}}\&{\\frac{{\\partial {y_1}}}{{\\partial {{\\hat x}_1}\^{(2)}}}}\& \\cdots \&{\\frac{{\\partial {y_1}}}{{\\partial {{\\hat x}_1}\^{({H_k}{W_k})}}}}\\\\ {\\frac{{\\partial {y_2}}}{{\\partial {{\\hat x}_2}\^{(1)}}}}\&{\\frac{{\\partial {y_2}}}{{\\partial {{\\hat x}_2}\^{(2)}}}}\& \\cdots \&{\\frac{{\\partial {y_2}}}{{\\partial {{\\hat x}_2}\^{({H_k}{W_k})}}}}\\\\ \\cdots \& \\cdots \& \\cdots \& \\cdots \\\\ {\\frac{{\\partial {y_{{H_{out}}{W_{out}}}}}}{{\\partial {{\\hat x}_{{H_{out}}{W_{out}}}}\^{(1)}}}}\&{\\frac{{\\partial {y_{{H_{out}}{W_{out}}}}}}{{\\partial {{\\hat x}_{{H_{out}}{W_{out}}}}\^{(2)}}}}\& \\cdots \&{\\frac{{\\partial {y_{{H_{out}}{W_{out}}}}}}{{\\partial {{\\hat x}_{{H_{out}}{W_{out}}}}\^{({H_k}{W_k})}}}} \\end{array}} \\right{({H{out}}{W_{out}} \times {H_k}{W_k})}} = \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} {\rm .{\mathop{argmax}\nolimits} \left( {axis = - 1} \right).eye\left( {axis = - 1} \right)} ∂Xcol(HoutWout×HkWk)∂Yflat(HoutWout×1)= ∂x^1(1)∂y1∂x^2(1)∂y2⋯∂x^HoutWout(1)∂yHoutWout∂x^1(2)∂y1∂x^2(2)∂y2⋯∂x^HoutWout(2)∂yHoutWout⋯⋯⋯⋯∂x^1(HkWk)∂y1∂x^2(HkWk)∂y2⋯∂x^HoutWout(HkWk)∂yHoutWout (HoutWout×HkWk)=Xcol(HoutWout×HkWk).argmax(axis=−1).eye(axis=−1)

最后,同样将 d X c o l ( H o u t W o u t × H k W k ) \mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{({H{out}}{W_{out}} \times {H_k}{W_k})} dXcol(HoutWout×HkWk)通过col2im算法还原并累加即可。应用到批数据同样公式不变:
Y f l a t ( N × C × H o u t W o u t × 1 ) = X c o l ( N × C × H o u t W o u t × H k W k ) . m a x ( a x i s = − 1 ) d X c o l ( N × C × H o u t W o u t × H k W k ) = G r a d f l a t ( N × C × H o u t W o u t × 1 ) × X c o l ( N × C × H o u t W o u t × H k W k ) . a r g m a x ( a x i s = − 1 ) . e y e ( a x i s = − 1 ) G R A D ′ ( N × C × H × W ) = ∑ c o l 2 i m ( d X c o l ( N × C × H o u t W o u t × H k W k ) ) \begin{array}{l} \mathop {{{\bf{Y}}{{\bf{flat}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times 1)} = \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times {H_k}{W_k})} {\rm .max\left( {axis = - 1} \right)}\\ \mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times {H_k}{W_k})} = \mathop {{\bf{Gra}}{{\bf{d}}{{\bf{flat}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times 1)} \times \mathop {{{\bf{X}}{{\bf{col}}}}}\limits^{(N \times C \times {H{out}}{W_{out}} \times {H_k}{W_k})} {\rm .{\mathop{argmax}\nolimits} \left( {axis = - 1} \right).eye\left( {axis = - 1} \right)}\\ {\bf{GRAD}}{{\bf{'}}{(N \times C \times H \times W)}} = \sum {{\mathop{\rm col2im}\nolimits} \left( {\mathop {{\bf{d}}{{\bf{X}}{{\bf{col}}}}}\limits^{(N \times C \times {H_{out}}{W_{out}} \times {H_k}{W_k})} } \right)} \end{array} Yflat(N×C×HoutWout×1)=Xcol(N×C×HoutWout×HkWk).max(axis=−1)dXcol(N×C×HoutWout×HkWk)=Gradflat(N×C×HoutWout×1)×Xcol(N×C×HoutWout×HkWk).argmax(axis=−1).eye(axis=−1)GRAD′(N×C×H×W)=∑col2im(dXcol(N×C×HoutWout×HkWk))

相关推荐
珠***格10 小时前
Ⅱ型边缘网关|易部署、易扩容、易改造
大数据·人工智能·分布式·能源·边缘计算
JobDocLS10 小时前
Jetson Orin的用法
深度学习
千百元10 小时前
codex不同档位大概费用
人工智能
机汇五金_10 小时前
矩阵机箱为什么越来越强调模块化设计?
人工智能·线性代数·矩阵
AI_yangxi10 小时前
短视频矩阵系统哪个好
大数据·人工智能·矩阵
云智慧AIOps社区10 小时前
云智慧Cloudwise 亮相华为云 × 霞光社中企私享会,Qreel 重构 AI 短剧出海新范式
人工智能·华为云·ai短剧·短剧创作
ar012310 小时前
工业AI质检:智能化时代的质量革命
人工智能·ar
码农翻身10 小时前
英伟达向左,华为云向右:AI数据中心该走哪条路?
人工智能·华为云
AI大法师10 小时前
老牌媒体怎么从“出版物更新”走到“品牌系统升级”
大数据·人工智能·设计模式·新媒体运营
JSMSEMI1110 小时前
JSM12N60F 600V N沟道功率MOSFET
人工智能·芯片