2.4.3. 梯度
我们可以连接一个多元函数对其所有变量的偏导数,以得到该函数的梯度 (gradient)向量。 具体而言,设函数 f : R n → R f:\mathbb{R}^{n}\to\mathbb{R} f:Rn→R的输入是一个 n n n维向量 x ⃗ = [ x 1 x 2 ⋅ ⋅ ⋅ x n ] \vec x=\begin{bmatrix} x_1\\x_2\\···\\x_n\end{bmatrix} x = x1x2⋅⋅⋅xn ,输出是一个标量。 函数 f ( x ⃗ ) f(\vec x) f(x )相对于 x ⃗ \vec x x 的梯度是一个包含 n n n个偏导数的向量:
∇ x ⃗ f ( x ⃗ ) = [ ∂ f ( x ⃗ ) ∂ x 1 ∂ f ( x ⃗ ) ∂ x 2 ⋅ ⋅ ⋅ ∂ f ( x ⃗ ) ∂ x n ] \nabla_{\vec x} f(\vec x) = \begin{bmatrix}\frac{\partial f(\vec x)}{\partial x_1}\\\frac{\partial f(\vec x)}{\partial x_2}\\···\\ \frac{\partial f(\vec x)}{\partial x_n}\end{bmatrix} ∇x f(x )= ∂x1∂f(x )∂x2∂f(x )⋅⋅⋅∂xn∂f(x )
其中 ∇ x ⃗ f ( x ⃗ ) \nabla_{\vec x} f(\vec x) ∇x f(x )通常在没有歧义时被 ∇ f ( x ⃗ ) \nabla f(\vec x) ∇f(x )取代。
假设 x ⃗ \vec x x 为 n n n维向量,在微分多元函数时经常使用以下规则:
一、对于所有 A ∈ R m × n A \in \mathbb{R^{m\times n}} A∈Rm×n,都有 ∇ x ⃗ A x ⃗ = A ⊤ \nabla_{\vec x} A\vec x = A^\top ∇x Ax =A⊤;
证明:设 A ( m , n ) A_{(m,n)} A(m,n) = [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , n a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a m , 1 a m , 2 ⋅ ⋅ ⋅ a m , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{m,1} & a_{m,2} &···&a_{m,n} \end{bmatrix} a1,1a2,1⋅⋅⋅am,1a1,2a2,2⋅⋅⋅am,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅am,n ,
则 A x ⃗ ( m , 1 ) A\vec x_{(m,1)} Ax (m,1) = [ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ⋅ ⋅ ⋅ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n \\ a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n \\ ··· \\ a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n \end{bmatrix} a1,1x1+a1,2x2+⋅⋅⋅+a1,nxna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅am,1x1+am,2x2+⋅⋅⋅+am,nxn ,
∇ x ⃗ A x ⃗ \nabla_{\vec x}A\vec x ∇x Ax = [ ∂ A x ⃗ ∂ x 1 ∂ A x ⃗ ∂ x 2 ⋅ ⋅ ⋅ ∂ A x ⃗ ∂ x n ] \begin{bmatrix}\frac{\partial A\vec x}{\partial x_1}\\\frac{\partial A\vec x}{\partial x_2}\\···\\ \frac{\partial A\vec x}{\partial x_n}\end{bmatrix} ∂x1∂Ax ∂x2∂Ax ⋅⋅⋅∂xn∂Ax
= [ ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x 1 ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x 1 ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x 1 ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x 2 ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x 2 ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x n ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x n ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_1}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_1}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_2}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_2}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_n}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_n}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_n}\end{bmatrix} ∂x1∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn∂x2∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn⋅⋅⋅∂xn∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn∂x1∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn∂x2∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅∂xn∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅∂x1∂am,1x1+am,2x2+⋅⋅⋅+am,nxn∂x2∂am,1x1+am,2x2+⋅⋅⋅+am,nxn⋅⋅⋅∂xn∂am,1x1+am,2x2+⋅⋅⋅+am,nxn
= [ a 1 , 1 a 2 , 1 ⋅ ⋅ ⋅ a m , 1 a 1 , 2 a 2 , 2 ⋅ ⋅ ⋅ a m , 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a 1 , n a 2 , n ⋅ ⋅ ⋅ a m , n ] \begin{bmatrix} a_{1,1} & a_{2,1} & ··· & a_{m,1}\\ a_{1,2} & a_{2,2} & ··· & a_{m,2} \\ ···&···&···&··· \\ a_{1,n}&a_{2,n}&···&a_{m,n} \end{bmatrix} a1,1a1,2⋅⋅⋅a1,na2,1a2,2⋅⋅⋅a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅am,1am,2⋅⋅⋅am,n = A ⊤ A^\top A⊤
二、对于所有 A ∈ R n × m A \in \mathbb{R^{n\times m}} A∈Rn×m,都有 ∇ x ⃗ x ⃗ ⊤ A = A \nabla_{\vec x} \vec x^\top A = A ∇x x ⊤A=A;
证明:设 A ( n , m ) A_{(n,m)} A(n,m)= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , m a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , m ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , m ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,m} \\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,m} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m ,
则 x ⃗ ⊤ A \vec x^\top A x ⊤A=
a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ⋅ ⋅ ⋅ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n \] \\begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n \& a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n \& ···\&a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n \\end{bmatrix} \[a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,mx1+a2,mx2+⋅⋅⋅+an,mxn\], ∇ x ⃗ x ⃗ ⊤ A \\nabla_{\\vec x}\\vec x\^\\top A ∇x x ⊤A= \[ ∂ x ⃗ ⊤ A ∂ x 1 ∂ x ⃗ ⊤ A ∂ x 2 ⋅ ⋅ ⋅ ∂ x ⃗ ⊤ A ∂ x n \] \\begin{bmatrix}\\frac{\\partial \\vec x\^\\top A}{\\partial x_1}\\\\\\frac{\\partial \\vec x\^\\top A}{\\partial x_2}\\\\···\\\\ \\frac{\\partial \\vec x\^\\top A}{\\partial x_n}\\end{bmatrix} ∂x1∂x ⊤A∂x2∂x ⊤A⋅⋅⋅∂xn∂x ⊤A = \[ ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x 1 ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x 1 ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x 1 ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x 2 ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x 2 ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x n ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x n ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x n \] \\begin{bmatrix}\\frac{\\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\\partial x_1}\& \\frac{\\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\\partial x_1}\&···\&\\frac{\\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\\partial x_1}\\\\ \\frac{\\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\\partial x_2}\& \\frac{\\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\\partial x_2}\&···\&\\frac{\\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\\partial x_2}\\\\ ···\&···\&···\&···\\\\ \\frac{\\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\\partial x_n}\& \\frac{\\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\\partial x_n}\&···\&\\frac{\\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\\partial x_n}\\end{bmatrix} ∂x1∂a1,1x1+a2,1x2+⋅⋅⋅+an,1xn∂x2∂a1,1x1+a2,1x2+⋅⋅⋅+an,1xn⋅⋅⋅∂xn∂a1,1x1+a2,1x2+⋅⋅⋅+an,1xn∂x1∂a1,2x1+a2,2x2+⋅⋅⋅+an,2xn∂x2∂a1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅∂xn∂a1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅∂x1∂a1,mx1+a2,mx2+⋅⋅⋅+an,mxn∂x2∂a1,mx1+a2,mx2+⋅⋅⋅+an,mxn⋅⋅⋅∂xn∂a1,mx1+a2,mx2+⋅⋅⋅+an,mxn = \[ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , m a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , m ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , m \] \\begin{bmatrix} a_{1,1} \& a_{1,2}\&···\&a_{1,m}\\\\ a_{2,1}\&a_{2,2}\&···\&a_{2,m} \\\\ ···\&···\&···\&···\\\\ a_{n,1}\&a_{n,2}\&···\&a_{n,m} \\end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m = A A A ##### 三、对于所有 A ∈ R n × n A \\in \\mathbb{R\^{n\\times n}} A∈Rn×n,都有 ∇ x ⃗ x ⃗ ⊤ A x ⃗ = ( A + A ⊤ ) x ⃗ \\nabla_{\\vec x} \\vec x\^\\top A \\vec x = (A+A\^\\top)\\vec x ∇x x ⊤Ax =(A+A⊤)x ; 证明:设 A ( n , n ) A_{(n,n)} A(n,n)= \[ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , n a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , n \] \\begin{bmatrix} a_{1,1}\&a_{1,2}\&···\&a_{1,n} \\\\ a_{2,1}\&a_{2,2}\&···\&a_{2,n} \\\\ ··· \& ··· \& ··· \& ··· \\\\ a_{n,1} \& a_{n,2} \&···\&a_{n,n} \\end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅an,n , 则 x ⃗ ⊤ A \\vec x\^\\top A x ⊤A= \[ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ⋅ ⋅ ⋅ a 1 , n x 1 + a 2 , n x 2 + ⋅ ⋅ ⋅ + a n , n x n \] \\begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n \& a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n \& ···\&a_{1,n}x_1+a_{2,n}x_2+···+a_{n,n}x_n \\end{bmatrix} \[a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,nx1+a2,nx2+⋅⋅⋅+an,nxn\], x ⃗ ⊤ A x ⃗ \\vec x\^\\top A \\vec x x ⊤Ax = \[ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) \] \\begin{bmatrix} \\sum\\limits_{i=1}\^{n}\\sum\\limits_{j=1}\^{n} (a_{i,j}x_ix_j) \\end{bmatrix} \[i=1∑nj=1∑n(ai,jxixj)\], ∇ x ⃗ x ⃗ ⊤ A x ⃗ \\nabla_{\\vec x}\\vec x\^\\top A \\vec x ∇x x ⊤Ax = \[ ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x 1 ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x 2 ⋅ ⋅ ⋅ ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x n \] \\begin{bmatrix} \\frac{\\partial \\sum\\limits_{i=1}\^{n}\\sum\\limits_{j=1}\^{n} (a_{i,j}x_ix_j)}{\\partial x_1} \\\\ \\frac{\\partial \\sum\\limits_{i=1}\^{n}\\sum\\limits_{j=1}\^{n} (a_{i,j}x_ix_j)}{\\partial x_2} \\\\ ···\\\\ \\frac{\\partial \\sum\\limits_{i=1}\^{n}\\sum\\limits_{j=1}\^{n} (a_{i,j}x_ix_j)}{\\partial x_n} \\end{bmatrix} ∂x1∂i=1∑nj=1∑n(ai,jxixj)∂x2∂i=1∑nj=1∑n(ai,jxixj)⋅⋅⋅∂xn∂i=1∑nj=1∑n(ai,jxixj) = \[ ∑ i = 1 n ( a i , 1 + a 1 , i ) x i ∑ i = 1 n ( a i , 2 + a 2 , i ) x i ⋅ ⋅ ⋅ ∑ i = 1 n ( a i , n + a n , i ) x i \] \\begin{bmatrix} \\sum\\limits_{i=1}\^{n}(a_{i,1}+a_{1,i})x_i \\\\ \\sum\\limits_{i=1}\^{n}(a_{i,2}+a_{2,i})x_i \\\\ ···\\\\ \\sum\\limits_{i=1}\^{n}(a_{i,n}+a_{n,i})x_i \\\\ \\end{bmatrix} i=1∑n(ai,1+a1,i)xii=1∑n(ai,2+a2,i)xi⋅⋅⋅i=1∑n(ai,n+an,i)xi = \[ 2 a 1 , 1 a 1 , 2 + a 2 , 1 ⋅ ⋅ ⋅ a 1 , n + a n , 1 a 2 , 1 + a 1 , 2 2 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n + a n , 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 + a 1 , n a n , 2 + a 2 , n ⋅ ⋅ ⋅ 2 a n , n \] \[ x 1 x 2 ⋅ ⋅ ⋅ x n \] \\begin{bmatrix} 2a_{1,1} \& a_{1,2}+a_{2,1} \& ···\&a_{1,n}+a_{n,1} \\\\ a_{2,1}+a_{1,2} \& 2a_{2,2} \& ···\&a_{2,n}+a_{n,2} \\\\ ···\&···\&···\&···\\\\ a_{n,1}+a_{1,n} \& a_{n,2}+a_{2,n} \& ···\&2a_{n,n} \\\\ \\end{bmatrix} \\begin{bmatrix} x_1\\\\ x_2\\\\ ···\\\\ x_n \\end{bmatrix} 2a1,1a2,1+a1,2⋅⋅⋅an,1+a1,na1,2+a2,12a2,2⋅⋅⋅an,2+a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,n+an,1a2,n+an,2⋅⋅⋅2an,n x1x2⋅⋅⋅xn = ( A + A ⊤ ) x ⃗ (A+A\^\\top)\\vec x (A+A⊤)x ##### 四、 ∇ x ⃗ ∥ x ∥ 2 = ∇ x ⃗ x ⃗ ⊤ x ⃗ = 2 x ⃗ \\nabla_{\\vec x} \\Vert x \\Vert \^2=\\nabla_{\\vec x}\\vec x\^\\top\\vec x = 2\\vec x ∇x ∥x∥2=∇x x ⊤x =2x 。 证明: ∇ x ⃗ ∥ x ∥ 2 \\nabla_{\\vec x}\\Vert x \\Vert \^2 ∇x ∥x∥2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n 2 \\nabla_{\\vec x}\\sqrt{x_1\^2+x_2\^2+···+x_n\^n}\^2 ∇x x12+x22+⋅⋅⋅+xnn 2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n \\nabla_{\\vec x}x_1\^2+x_2\^2+···+x_n\^n ∇x x12+x22+⋅⋅⋅+xnn= ∇ x ⃗ x ⊤ x \\nabla_{\\vec x}x\^\\top x ∇x x⊤x; ∇ x ⃗ ∥ x ∥ 2 \\nabla_{\\vec x}\\Vert x \\Vert \^2 ∇x ∥x∥2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n 2 \\nabla_{\\vec x}\\sqrt{x_1\^2+x_2\^2+···+x_n\^n}\^2 ∇x x12+x22+⋅⋅⋅+xnn 2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n \\nabla_{\\vec x}x_1\^2+x_2\^2+···+x_n\^n ∇x x12+x22+⋅⋅⋅+xnn= \[ 2 x 1 2 x 2 ⋅ ⋅ ⋅ 2 x n \] \\begin{bmatrix} 2x_1\\\\ 2x_2\\\\ ···\\\\ 2x_n \\end{bmatrix} 2x12x2⋅⋅⋅2xn = 2 x 2x 2x 同样,对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \\nabla_X \\Vert X \\Vert_F\^2=2X ∇X∥X∥F2=2X。正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。 ##### 五、对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \\nabla_X \\Vert X \\Vert_F\^2=2X ∇X∥X∥F2=2X 证明:设 X X X为 m × n m\\times n m×n的矩阵, X = \[ x 1 , 1 x 1 , 2 ⋅ ⋅ ⋅ x 1 , n x 2 , 1 x 2 , 2 ⋅ ⋅ ⋅ x 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ x m , 1 x m , 2 ⋅ ⋅ ⋅ x m , n \] X = \\begin{bmatrix} x_{1,1}\& x_{1,2}\&···\&x_{1,n}\\\\ x_{2,1}\& x_{2,2}\&···\&x_{2,n}\\\\ ···\&···\&···\&···\\\\ x_{m,1}\& x_{m,2}\&···\&x_{m,n}\\\\ \\end{bmatrix} X= x1,1x2,1⋅⋅⋅xm,1x1,2x2,2⋅⋅⋅xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1,nx2,n⋅⋅⋅xm,n , 则 ∥ X ∥ F 2 \\Vert X \\Vert_F\^2 ∥X∥F2= ∑ i = 1 m ∑ j = 1 n x i , j 2 2 \\sqrt{\\sum\\limits_{i=1}\^{m}\\sum\\limits_{j=1}\^n x_{i,j}\^2}\^2 i=1∑mj=1∑nxi,j2 2= ∑ i = 1 m ∑ j = 1 n x i , j 2 \\sum\\limits_{i=1}\^{m}\\sum\\limits_{j=1}\^n x_{i,j}\^2 i=1∑mj=1∑nxi,j2, ∇ X ∥ X ∥ F 2 \\nabla_X \\Vert X \\Vert_F\^2 ∇X∥X∥F2= \[ 2 x 1 , 1 2 x 1 , 2 ⋅ ⋅ ⋅ 2 x 1 , n 2 x 2 , 1 2 x 2 , 2 ⋅ ⋅ ⋅ 2 x 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 2 x m , 1 2 x m , 2 ⋅ ⋅ ⋅ 2 x m , n \] \\begin{bmatrix} 2x_{1,1}\& 2x_{1,2}\&···\&2x_{1,n}\\\\ 2x_{2,1}\& 2x_{2,2}\&···\&2x_{2,n}\\\\ ···\&···\&···\&···\\\\ 2x_{m,1}\& 2x_{m,2}\&···\&2x_{m,n}\\\\ \\end{bmatrix} 2x1,12x2,1⋅⋅⋅2xm,12x1,22x2,2⋅⋅⋅2xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅2x1,n2x2,n⋅⋅⋅2xm,n = 2 X 2X 2X ***初看公式时没看懂,所以自己推了一遍加深印象,以上内容为推导过程,有问题欢迎讨论***