导数、梯度、雅可比矩阵、黑塞矩阵都是与求导相关的一些概念,比较容易混淆,本文主要是对它们的使用场景和定义进行区分。
首先需要先明确一些函数的叫法(是否多元,以粗体和非粗体进行区分):
一元函数
: f ( x ) : R ⟶ R f(x):\mathbb{R} \longrightarrow \mathbb{R} f(x):R⟶R多元函数
: f ( x ) : R n ⟶ R f(\mathbf{x}):\mathbb{R}^{n} \longrightarrow \mathbb{R} f(x):Rn⟶R向量函数
: f ( x ) : R n ⟶ R m \mathbf{f(x)}:\mathbb{R}^{n} \longrightarrow \mathbb{R}^{m} f(x):Rn⟶Rm
例如:
- 函数 y = x y=x y=x为一元函数
- 函数 y = x 1 + 2 x 2 y=x_1+2x_2 y=x1+2x2为多元函数
- 函数 { y 1 = x 1 + 2 x 2 y 2 = 2 x 1 + x 2 \begin{cases} y_1 =x_1+2x_2 \\ y_2=2x_1+x_2 \end{cases} {y1=x1+2x2y2=2x1+x2为向量函数
概念详解
导数
针对一元函数: f ( x ) : R ⟶ R f(x):\mathbb{R} \longrightarrow \mathbb{R} f(x):R⟶R,近似:
f ( x ) ≈ f ( x 0 ) + f ′ ( x 0 ) ( x − x 0 ) f(x)\approx f(x_{0})+f^{\prime}(x_{0})(x-x_{0}) f(x)≈f(x0)+f′(x0)(x−x0)
梯度
针对多元函数: f ( x ) : R n ⟶ R f(\mathbf{x}):\mathbb{R}^{n} \longrightarrow \mathbb{R} f(x):Rn⟶R,是导数的推广, 它的结果是一个向量:
▽ f = [ ∂ f ∂ x 1 ∂ f ∂ x 2 . . . ∂ f ∂ x n ] \bigtriangledown f=\begin{bmatrix} \frac{\partial f}{\partial x_{1}} \\ \frac{\partial f}{\partial x_{2}} \\ ... \\ \frac{\partial f}{\partial x_{n}} \end{bmatrix} ▽f= ∂x1∂f∂x2∂f...∂xn∂f
近似:
f ( x ) ≈ f ( x 0 ) + ▽ f ( x 0 ) ( x − x 0 ) f(\mathbf{x} )\approx f(\mathbf{x}{0})+\bigtriangledown f(\mathbf{x}{0})(\mathbf{x}-\mathbf{x}_{0}) f(x)≈f(x0)+▽f(x0)(x−x0)
雅可比矩阵
针对向量函数: f ( x ) : R n ⟶ R m \mathbf{f(x)}:\mathbb{R}^{n} \longrightarrow \mathbb{R}^{m} f(x):Rn⟶Rm
如果函数 f ( x ) : R n ⟶ R m \mathbf{f(x)}:\mathbb{R}^{n} \longrightarrow \mathbb{R}^{m} f(x):Rn⟶Rm在点 x \mathbf{x} x处可微的话,在点 x \mathbf{x} x的雅可比矩阵即为该函数在该点的最佳线性逼近 ,也代表雅可比矩阵是一元函数的导数在向量函数的推广。在这种情况下,雅可比矩阵也被称作函数 f \mathbf{f} f在点 x \mathbf{x} x的微分或者导数,其中行数为 f \mathbf{f} f的维数;列数为 x \mathbf{x} x的维度。
J = [ ∂ f ∂ x 1 . . . ∂ f ∂ x n ] = [ ∂ f 1 ∂ x 1 . . . ∂ f 1 ∂ x n ⋮ ⋱ ⋮ ∂ f m ∂ x 1 . . . ∂ f m ∂ x n ] \mathbf{J}=\begin{bmatrix} \frac{\partial \mathbf{f}}{\partial x_{1}} & ... & \frac{\partial \mathbf{f}}{\partial x_{n}} \end{bmatrix} = \begin{bmatrix} \frac{\partial f_{1}}{\partial x_{1}} & ... & \frac{\partial f_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_{m}}{\partial x_{1}} & ... & \frac{\partial f_{m}}{\partial x_{n}} \end{bmatrix} J=[∂x1∂f...∂xn∂f]= ∂x1∂f1⋮∂x1∂fm...⋱...∂xn∂f1⋮∂xn∂fm
矩阵分量:
J i j = ∂ f i ∂ x j \mathbf{J}{ij}=\frac{\partial f{i}}{\partial x_{j}} Jij=∂xj∂fi
近似:
f ( x ) ≈ f ( x 0 ) + J ( x 0 ) ( x − x 0 ) \mathbf{f}(\mathbf{x} )\approx \mathbf{f}(\mathbf{x}{0})+ \mathbf{J}(\mathbf{x}{0})(\mathbf{x}-\mathbf{x}_{0}) f(x)≈f(x0)+J(x0)(x−x0)
黑塞矩阵
针对多元函数: f : R n ⟶ R f:\mathbb{R}^{n} \longrightarrow \mathbb{R} f:Rn⟶R,有点二阶导数的意思。
H = [ ∂ 2 f ∂ x 1 2 ∂ 2 f ∂ x 1 ∂ x 2 . . . ∂ 2 f ∂ x 1 ∂ x n ∂ 2 f ∂ x 2 ∂ x 1 ∂ 2 f ∂ x 2 2 . . . ∂ 2 f ∂ x 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ x n ∂ x 1 ∂ 2 f ∂ x n ∂ x 2 . . . ∂ 2 f ∂ x n 2 ] \mathbf{H}=\begin{bmatrix} \frac{\partial^{2} f}{\partial x_{1}^{2}} & \frac{\partial^{2} f}{\partial x_{1}\partial x_{2}} & ... & \frac{\partial^{2} f}{\partial x_{1}\partial x_{n}} \\ \frac{\partial^{2} f}{\partial x_{2}\partial x_{1}} & \frac{\partial^{2} f}{\partial x_{2}^{2}} & ... & \frac{\partial^{2} f}{\partial x_{2}\partial x_{n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^{2} f}{\partial x_{n}\partial x_{1}} & \frac{\partial^{2} f}{\partial x_{n}\partial x_{2}} & ... & \frac{\partial^{2} f}{\partial x_{n}^{2}} \end{bmatrix} H= ∂x12∂2f∂x2∂x1∂2f⋮∂xn∂x1∂2f∂x1∂x2∂2f∂x22∂2f⋮∂xn∂x2∂2f......⋱...∂x1∂xn∂2f∂x2∂xn∂2f⋮∂xn2∂2f
矩阵分量:
H i j = ∂ 2 f ∂ x i ∂ x j \mathbf{H}{ij}=\frac{\partial^{2} f}{\partial x{i}\partial x_{j}} Hij=∂xi∂xj∂2f
近似:
f ( x ) ≈ f ( x 0 ) + ▽ f ( x 0 ) ( x − x 0 ) + 1 2 ( x − x 0 ) T H ( x 0 ) ( x − x 0 ) f(\mathbf{x} )\approx f(\mathbf{x}{0})+\bigtriangledown f(\mathbf{x}{0})(\mathbf{x}-\mathbf{x}{0}) + \frac{1}{2}(\mathbf{x}-\mathbf{x}{0})^{T}\mathbf{H}(\mathbf{x}{0})(\mathbf{x}-\mathbf{x}{0}) f(x)≈f(x0)+▽f(x0)(x−x0)+21(x−x0)TH(x0)(x−x0)
实例
对于最简单的一元函数 y = 2 x y=2x y=2x,则该一元函数的导数为: y ′ = 2 y^{\prime}=2 y′=2。这是最基础的了。
对于一个多元函数 y = x 1 4 x 2 + 3 x 2 + x 2 e x 3 y=x_1^4x_2+3x_2+x_2e^{x_3} y=x14x2+3x2+x2ex3,则:
该多元函数的梯度为:
▽ = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ∂ y ∂ x 3 ] = [ 4 x 1 3 x 2 x 1 4 + 3 + e x 3 x 2 e x 3 ] \bigtriangledown =\begin{bmatrix} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \frac{\partial y}{\partial x_3} \end{bmatrix}=\begin{bmatrix} 4x_1^3x_2 \\ x_1^4+3+e^{x_3} \\ x_2e^{x_3}\end{bmatrix} ▽= ∂x1∂y∂x2∂y∂x3∂y = 4x13x2x14+3+ex3x2ex3
该多元函数的黑塞矩阵为:
H = [ ∂ 2 y ∂ x 1 2 ∂ 2 y ∂ x 1 ∂ x 2 ∂ 2 y ∂ x 1 ∂ x 3 ∂ 2 y ∂ x 2 ∂ x 1 ∂ 2 y ∂ x 2 2 ∂ 2 y ∂ x 2 ∂ x 3 ∂ 2 y ∂ x 3 ∂ x 1 ∂ 2 y ∂ x 3 ∂ x 2 ∂ 2 y ∂ x 3 2 ] = [ 12 x 1 2 x 2 4 x 1 3 0 4 x 1 3 0 e x 3 0 e x 3 x 2 e x 3 ] \mathbf{H}=\begin{bmatrix} \frac{\partial^{2} y}{\partial x_{1}^{2}} & \frac{\partial^{2} y}{\partial x_{1}\partial x_{2}} & \frac{\partial^{2} y}{\partial x_{1}\partial x_{3}} \\ \frac{\partial^{2} y}{\partial x_{2}\partial x_{1}} & \frac{\partial^{2} y}{\partial x_{2}^{2}} & \frac{\partial^{2} y}{\partial x_{2}\partial x_{3}} \\ \frac{\partial^{2} y}{\partial x_{3}\partial x_{1}} & \frac{\partial^{2} y}{\partial x_{3}\partial x_{2}} & \frac{\partial^{2} y}{\partial x_{3}^{2}} \end{bmatrix} = \begin{bmatrix} 12x_1^2x_2 & 4x_1^3 & 0\\ 4x_1^3 & 0 & e^{x_3}\\ 0 & e^{x_3} & x_2e^{x_3} \end{bmatrix} H= ∂x12∂2y∂x2∂x1∂2y∂x3∂x1∂2y∂x1∂x2∂2y∂x22∂2y∂x3∂x2∂2y∂x1∂x3∂2y∂x2∂x3∂2y∂x32∂2y = 12x12x24x1304x130ex30ex3x2ex3
视该多元函数的梯度为一个向量函数,即:
{ y 1 = 4 x 1 3 x 2 y 2 = x 1 4 + 3 + e x 3 y 3 = x 2 e x 3 \begin{cases} y_1 =4x_1^3x_2 \\ y_2=x_1^4+3+e^{x_3} \\ y_3=x_2e^{x_3} \end{cases} ⎩ ⎨ ⎧y1=4x13x2y2=x14+3+ex3y3=x2ex3
那么,该多元函数的雅可比矩阵为:
J = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 1 ∂ y 3 ∂ x 2 ∂ y 3 ∂ x 3 ] = [ 12 x 1 2 x 2 4 x 1 3 0 4 x 1 3 0 e x 3 0 e x 3 x 2 e x 3 ] \mathbf{J}= \begin{bmatrix} \frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} & \frac{\partial y_{1}}{\partial x_{3}} \\ \frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}} & \frac{\partial y_{2}}{\partial x_{3}} \\ \frac{\partial y_{3}}{\partial x_{1}} & \frac{\partial y_{3}}{\partial x_{2}} & \frac{\partial y_{3}}{\partial x_{3}} \end{bmatrix} = \begin{bmatrix} 12x_1^2x_2 & 4x_1^3 & 0\\ 4x_1^3 & 0 & e^{x_3}\\ 0 & e^{x_3} & x_2e^{x_3} \end{bmatrix} J= ∂x1∂y1∂x1∂y2∂x1∂y3∂x2∂y1∂x2∂y2∂x2∂y3∂x3∂y1∂x3∂y2∂x3∂y3 = 12x12x24x1304x130ex30ex3x2ex3
可以看出,黑塞矩阵是多元函数 f ( x ) f(\mathbf{x}) f(x)的梯度对自变量 x \mathbf{x} x的雅可比矩阵。
总结
- 梯度是雅可比矩阵的一个特例 :当向量函数为标量函数时( f \mathbf{f} f向量维度为1),雅可比矩阵是梯度向量
- 黑塞矩阵是多元函数 f ( x ) f(\mathbf{x}) f(x)的梯度对自变量 x \mathbf{x} x的雅可比矩阵