Matrix calculus {矩阵微积分}
- [1. Matrix calculus (矩阵微积分)](#1. Matrix calculus (矩阵微积分))
- [2. Numerator layout (分子布局) - Denominator layout (分母布局)](#2. Numerator layout (分子布局) - Denominator layout (分母布局))
- [3. Derivatives with vectors (向量求导) - Derivatives with matrices (矩阵求导)](#3. Derivatives with vectors (向量求导) - Derivatives with matrices (矩阵求导))
-
- [3.1. Scalar-by-vector (标量对向量求导)](#3.1. Scalar-by-vector (标量对向量求导))
- [3.2. Vector-by-scalar (向量对标量求导)](#3.2. Vector-by-scalar (向量对标量求导))
- [3.3. Scalar-by-matrix (标量对矩阵求导)](#3.3. Scalar-by-matrix (标量对矩阵求导))
- [3.4. Matrix-by-scalar (矩阵对标量求导)](#3.4. Matrix-by-scalar (矩阵对标量求导))
- [3.5. Vector-by-vector (向量对向量求导)](#3.5. Vector-by-vector (向量对向量求导))
- References
1. Matrix calculus (矩阵微积分)
使用 M ( n , m ) M(n, m) M(n,m) 来表示包含 n n n 行 m m m 列的 n × m n \times m n×m 实矩阵的空间,它等同于 R n × m \mathbb {R} ^{n\times m} Rn×m。该空间中的一般矩阵用粗体大写字母表示,例如 A \mathbf {A} A, X \mathbf {X} X, Y \mathbf {Y} Y 等。而若该矩阵属于 M ( n , 1 ) M(n,1) M(n,1),即列向量,则用粗体小写字母表示,如 a \mathbf {a} a, x \mathbf {x} x, y \mathbf {y} y 等。 M ( 1 , 1 ) M(1,1) M(1,1) 中的元素为标量,用小写斜体字母表示,如 a a a, x x x, y y y 等。
X T \mathbf {X}^{\text{T}} XT 表示矩阵转置, tr ( X ) \text{tr}(\mathbf {X}) tr(X) 表示矩阵的迹,而 det ( X ) \det(\mathbf {X}) det(X) 或 ∣ X ∣ |\mathbf {X} | ∣X∣ 表示行列式。
There are a total of nine possibilities using scalars, vectors, and matrices.
| Types | Scalar | Vector | Matrix |
|---|---|---|---|
| Scalar | ∂ y ∂ x \frac{\partial y}{\partial x} ∂x∂y | ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial x} ∂x∂y | ∂ Y ∂ x \frac{\partial \mathbf{Y}}{\partial x} ∂x∂Y |
| Vector | ∂ y ∂ x \frac{\partial y}{\partial \mathbf{x}} ∂x∂y | ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial \mathbf{x}} ∂x∂y | |
| Matrix | ∂ y ∂ X \frac{\partial y}{\partial \mathbf{X}} ∂X∂y |
The notations developed here can accommodate the usual operations of vector calculus by identifying the space M ( n , 1 ) M(n,1) M(n,1) of n n n-vectors with the Euclidean space R n \mathbf {R}^{n} Rn, and the scalar M ( 1 , 1 ) M(1,1) M(1,1) is identified with R \mathbf {R} R.
For a scalar function of three independent variables, f ( x 1 , x 2 , x 3 ) f(x_1, x_2, x_3) f(x1,x2,x3), the gradient is given by the vector equation
∇ f = ∂ f ∂ x 1 x ^ 1 + ∂ f ∂ x 2 x ^ 2 + ∂ f ∂ x 3 x ^ 3 , \nabla f = \frac{\partial f}{\partial x_1} \hat{x}_1 + \frac{\partial f}{\partial x_2} \hat{x}_2 + \frac{\partial f}{\partial x_3} \hat{x}_3, ∇f=∂x1∂fx^1+∂x2∂fx^2+∂x3∂fx^3,
where x ^ i \hat{x}_i x^i represents a unit vector in the x i x_i xi direction for 1 ≤ i ≤ 3 1\le i \le 3 1≤i≤3. This type of generalized derivative can be seen as the derivative of a scalar, f f f, with respect to a vector, x \mathbf{x} x, and its result can be easily collected in vector form.
∇ f = ( ∂ f ∂ x ) T = [ ∂ f ∂ x 1 ∂ f ∂ x 2 ∂ f ∂ x 3 ] T . \nabla f = \left( \frac{\partial f}{\partial \mathbf{x}} \right)^{\mathsf{T}} = \begin{bmatrix} \dfrac{\partial f}{\partial x_1} & \dfrac{\partial f}{\partial x_2} & \dfrac{\partial f}{\partial x_3} \\ \end{bmatrix}^\textsf{T}. ∇f=(∂x∂f)T=[∂x1∂f∂x2∂f∂x3∂f]T.
More complicated examples include the derivative of a scalar function with respect to a matrix, known as the gradient matrix, which collects the derivative with respect to each matrix element in the corresponding position in the resulting matrix.
2. Numerator layout (分子布局) - Denominator layout (分母布局)
分子布局记法:在表示导数向量或矩阵时,该导数的行数等于导数表达式中处于分子部分的参数维度。
分母布局记法:在表示导数向量或矩阵时,该导数的行数等于导数表达式中处于分母部分的参数维度。
分子布局记法的结果与分母布局记法的结果互为转置关系。
numerator [ˈnjuːməˌreɪtə(r)]
n. (分数中的) 分子
denominator [dɪˈnɒmɪˌneɪtə(r)]
n. 分母
The fundamental issue is that the derivative of a vector with respect to a vector, i.e. ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial \mathbf{x}} ∂x∂y, is often written in two competing ways. If the numerator y \mathbf{y} y is of size m m m and the denominator x \mathbf{x} x of size n n n, then the result can be laid out as either an m × n m \times n m×n matrix or n × m n \times m n×m matrix, i.e. the m m m elements of y \mathbf{y} y laid out in rows and the n n n elements of x \mathbf{x} x laid out in columns, or vice versa. This leads to the following possibilities:
-
Numerator layout, i.e. lay out according to y \mathbf{y} y and x T \mathbf{x}^T xT (i.e. contrarily to x \mathbf{x} x). This is sometimes known as the Jacobian formulation. This corresponds to the m × n m \times n m×n layout in the previous example, which means that the row number of ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial \mathbf{x}} ∂x∂y equals to the size of the numerator y \mathbf{y} y and the column number of ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial \mathbf{x}} ∂x∂y equals to the size of x T \mathbf{x}^T xT.
-
Denominator layout, i.e. lay out according to y T \mathbf{y}^T yT and x \mathbf{x} x (i.e. contrarily to y \mathbf{y} y). This is sometimes known as the Hessian formulation. Some authors term this layout the gradient, in distinction to the Jacobian (numerator layout), which is its transpose. (However, gradient more commonly means the derivative ∂ y ∂ x \frac{\partial y}{\partial \mathbf{x}} ∂x∂y, regardless of layout.) This corresponds to the n × m n \times m n×m layout in the previous example, which means that the row number of ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial \mathbf{x}} ∂x∂y equals to the size of x \mathbf{x} x (the denominator).
-
A third possibility sometimes seen is to insist on writing the derivative as ∂ y ∂ x ′ \frac{\partial \mathbf{y}}{\partial \mathbf{x}'} ∂x′∂y, (i.e. the derivative is taken with respect to the transpose of x \mathbf{x} x) and follow the numerator layout. This makes it possible to claim that the matrix is laid out according to both numerator and denominator. In practice this produces results the same as the numerator layout.
| Scalar y y y | Scalar y y y | Column vector y \mathbf{y} y (size m × 1 m \times 1 m×1) | Column vector y \mathbf{y} y (size m × 1 m \times 1 m×1) | Matrix Y \mathbf{Y} Y (size m × n m \times n m×n) | Matrix Y \mathbf{Y} Y (size m × n m \times n m×n) | ||
|---|---|---|---|---|---|---|---|
| Notation | Type | Notation | Type | Notation | Type | ||
| Scalar x x x | Numerator | ∂ y ∂ x \frac{\partial y}{\partial x} ∂x∂y | Scalar | ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial x} ∂x∂y | Size- m m m column vector | ∂ Y ∂ x \frac{\partial \mathbf{Y}}{\partial x} ∂x∂Y | m × n m \times n m×n matrix |
| Scalar x x x | Denominator | ∂ y ∂ x \frac{\partial y}{\partial x} ∂x∂y | Scalar | ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial x} ∂x∂y | Size- m m m row vector | ∂ Y ∂ x \frac{\partial \mathbf{Y}}{\partial x} ∂x∂Y | - |
| Column vector x \mathbf x x (size n × 1 n \times 1 n×1) | Numerator | ∂ y ∂ x \frac{\partial y}{\partial \mathbf{x}} ∂x∂y | Size- n n n row vector | ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial \mathbf{x}} ∂x∂y | m × n m \times n m×n matrix | ∂ Y ∂ x \frac{\partial \mathbf{Y}}{\partial \mathbf{x}} ∂x∂Y | - |
| Column vector x \mathbf x x (size n × 1 n \times 1 n×1) | Denominator | ∂ y ∂ x \frac{\partial y}{\partial \mathbf{x}} ∂x∂y | Size- n n n column vector | ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial \mathbf{x}} ∂x∂y | n × m n \times m n×m matrix | ∂ Y ∂ x \frac{\partial \mathbf{Y}}{\partial \mathbf{x}} ∂x∂Y | - |
| Matrix X \mathbf X X (size p × q p \times q p×q | Numerator | ∂ y ∂ X \frac{\partial y}{\partial \mathbf{X}} ∂X∂y | q × p q \times p q×p matrix | ∂ y ∂ X \frac{\partial \mathbf{y}}{\partial \mathbf{X}} ∂X∂y | - | ∂ Y ∂ X \frac{\partial \mathbf{Y}}{\partial \mathbf{X}} ∂X∂Y | - |
| Matrix X \mathbf X X (size p × q p \times q p×q | Denominator | ∂ y ∂ X \frac{\partial y}{\partial \mathbf{X}} ∂X∂y | p × q p \times q p×q matrix | ∂ y ∂ X \frac{\partial \mathbf{y}}{\partial \mathbf{X}} ∂X∂y | - | ∂ Y ∂ X \frac{\partial \mathbf{Y}}{\partial \mathbf{X}} ∂X∂Y | - |
The results of operations will be transposed when switching between numerator-layout and denominator-layout notation.
3. Derivatives with vectors (向量求导) - Derivatives with matrices (矩阵求导)
3.1. Scalar-by-vector (标量对向量求导)
The derivative of a scalar y y y by a vector x = [ x 1 x 2 ⋯ x n ] \mathbf{x} = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix} x=[x1x2⋯xn], is written (in numerator layout notation) as
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ] . \frac{\partial y}{\partial \mathbf{x}} = \begin{bmatrix} \dfrac{\partial y}{\partial x_1} & \dfrac{\partial y}{\partial x_2} & \cdots & \dfrac{\partial y}{\partial x_n} \end{bmatrix}. ∂x∂y=[∂x1∂y∂x2∂y⋯∂xn∂y].
Using denominator-layout notation, we have:
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] . \frac{\partial y}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \vdots \\ \frac{\partial y}{\partial x_n} \\ \end{bmatrix}. ∂x∂y= ∂x1∂y∂x2∂y⋮∂xn∂y .
3.2. Vector-by-scalar (向量对标量求导)
The derivative of a vector y = [ y 1 y 2 ⋯ y m ] T \mathbf{y} = \begin{bmatrix} y_1 & y_2 & \cdots & y_m \end{bmatrix}^\mathsf{T} y=[y1y2⋯ym]T, by a scalar x x x is written (in numerator layout notation) as
∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y m ∂ x ] . \frac{\partial \mathbf{y}}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x}\\ \frac{\partial y_2}{\partial x}\\ \vdots\\ \frac{\partial y_m}{\partial x}\\ \end{bmatrix}. ∂x∂y= ∂x∂y1∂x∂y2⋮∂x∂ym .
标量可以视作一个 1 1 1 维向量。采用分子布局记法时, m m m 维向量对标量求导的结果是一个 m × 1 m \times 1 m×1 的矩阵,也就是 m m m 维列向量。
In vector calculus the derivative of a vector y \mathbf {y} y with respect to a scalar x x x is known as the tangent vector of the vector y \mathbf {y} y, ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial x} ∂x∂y.
在向量微积分中,向量 y \mathbf {y} y 关于标量 x x x 的导数也被称为向量 y \mathbf {y} y 的切向量, ∂ y ∂ x {\frac {\partial \mathbf {y} }{\partial x}} ∂x∂y。
Using denominator-layout notation, we have:
∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋯ ∂ y m ∂ x ] . \frac{\partial \mathbf{y}}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} & \cdots & \frac{\partial y_m}{\partial x} \end{bmatrix}. ∂x∂y=[∂x∂y1∂x∂y2⋯∂x∂ym].
3.3. Scalar-by-matrix (标量对矩阵求导)
The derivative of a scalar function y y y, with respect to a p × q p \times q p×q matrix X \mathbf X X of independent variables, is given (in numerator layout notation) by
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 21 ⋯ ∂ y ∂ x p 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ⋯ ∂ y ∂ x p 2 ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 q ∂ y ∂ x 2 q ⋯ ∂ y ∂ x p q ] . \frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}}\\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix}. ∂X∂y= ∂x11∂y∂x12∂y⋮∂x1q∂y∂x21∂y∂x22∂y⋮∂x2q∂y⋯⋯⋱⋯∂xp1∂y∂xp2∂y⋮∂xpq∂y .
Using denominator-layout notation, we have:
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 12 ⋯ ∂ y ∂ x 1 q ∂ y ∂ x 21 ∂ y ∂ x 22 ⋯ ∂ y ∂ x 2 q ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x p 1 ∂ y ∂ x p 2 ⋯ ∂ y ∂ x p q ] . \frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1q}}\\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2q}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{p1}} & \frac{\partial y}{\partial x_{p2}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix}. ∂X∂y= ∂x11∂y∂x21∂y⋮∂xp1∂y∂x12∂y∂x22∂y⋮∂xp2∂y⋯⋯⋱⋯∂x1q∂y∂x2q∂y⋮∂xpq∂y .
3.4. Matrix-by-scalar (矩阵对标量求导)
The derivative of a matrix function Y \mathbf {Y} Y by a scalar x x x is known as the tangent matrix and is given (only provided in numerator layout notation) by
∂ Y ∂ x = [ ∂ y 11 ∂ x ∂ y 12 ∂ x ⋯ ∂ y 1 n ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ⋯ ∂ y 2 n ∂ x ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x ∂ y m 2 ∂ x ⋯ ∂ y m n ∂ x ] . \frac{\partial \mathbf{Y}}{\partial x} = \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x}\\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x}\\ \end{bmatrix}. ∂x∂Y= ∂x∂y11∂x∂y21⋮∂x∂ym1∂x∂y12∂x∂y22⋮∂x∂ym2⋯⋯⋱⋯∂x∂y1n∂x∂y2n⋮∂x∂ymn .
3.5. Vector-by-vector (向量对向量求导)
The derivative of a vector function (a vector whose components are functions) y = [ y 1 y 2 ⋯ y m ] T \mathbf{y} = \begin{bmatrix} y_1 & y_2 & \cdots & y_m \end{bmatrix}^\mathsf{T} y=[y1y2⋯ym]T, with respect to an input vector, x = [ x 1 x 2 ⋯ x n ] T \mathbf{x} = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}^\mathsf{T} x=[x1x2⋯xn]T, is written (in numerator layout notation) as
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] . \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix}. ∂x∂y= ∂x1∂y1∂x1∂y2⋮∂x1∂ym∂x2∂y1∂x2∂y2⋮∂x2∂ym⋯⋯⋱⋯∂xn∂y1∂xn∂y2⋮∂xn∂ym .
In vector calculus, the derivative of a vector function y \mathbf{y} y with respect to a vector x \mathbf{x} x whose components represent a space is known as the pushforward (or differential), or the Jacobian matrix.
在向量微积分中,向量函数 y \mathbf {y} y 对分量表示一个空间的向量 x \mathbf {x} x 的导数也被称为前推,或雅可比矩阵。
Using denominator-layout notation, we have:
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ⋯ ∂ y m ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ⋯ ∂ y m ∂ x n ] . \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix}. ∂x∂y= ∂x1∂y1∂x2∂y1⋮∂xn∂y1∂x1∂y2∂x2∂y2⋮∂xn∂y2⋯⋯⋱⋯∂x1∂ym∂x2∂ym⋮∂xn∂ym .
References
1\] Yongqiang Cheng (程永强),