线性代数 · 矩阵 | 四个基本子空间与相关分解

注：英文引文，机翻未校。

如有内容异常，请看原文。

The Four Fundamental Subspaces: 4 Lines

四个基本子空间：四条线

Gilbert Strang, Massachusetts Institute of Technology

吉尔伯特·斯特朗，麻省理工学院

1. Introduction

1. 引言

The expression "Four Fundamental Subspaces" has become familiar to thousands of linear algebra students. Those subspaces are the column space and the nullspace of A A A and A T A^{T} AT. They lift the understanding of A x = b Ax = b Ax=b to a higher level---a subspace level. The first step sees A x Ax Ax (matrix times vector) as a combination of the columns of A A A. Those vectors A x Ax Ax fill the column space C ( A ) C(A) C(A). When we move from one combination to all combinations (by allowing every x x x), a subspace appears. A x = b Ax = b Ax=b has a solution exactly when b b b is in the column space of A A A.

"四个基本子空间"这一表述已为成千上万的线性代数学习者所熟知。这些子空间包括矩阵 A A A 及其转置矩阵 A T A^{T} AT 的列空间与零空间。它们将对 A x = b Ax = b Ax=b 的理解提升到了一个更高的层次------子空间层次。第一步，我们将 A x Ax Ax（矩阵与向量的乘积）视为矩阵 A A A 各列的线性组合，所有这样的向量 A x Ax Ax 构成了列空间 C ( A ) C(A) C(A)。当我们从单一组合扩展到所有可能的组合（即允许 x x x 取任意值）时，子空间便随之形成。方程 A x = b Ax = b Ax=b 有解的充要条件是向量 b b b 属于矩阵 A A A 的列空间。

The next section of this note will introduce all four subspaces. They are connected by the Fundamental Theorem of Linear Algebra. A perceptive reader may recognize the Singular Value Decomposition, when Part 3 of this theorem provides perfect bases for the four subspaces. The three parts are well separated in a linear algebra course! The first part goes as far as the dimensions of the subspaces, using the rank. The second part is their orthogonality---two subspaces in R n R^{n} Rn and two in R m R^{m} Rm. The third part needs eigenvalues and eigenvectors of A T A A^{T}A ATA to find the best bases. Figure 1 will show the "big picture" of linear algebra, with the four bases added in Figure 2.

本文的下一节将介绍全部四个子空间。它们通过线性代数基本定理相互关联。细心的读者可能会意识到，当该定理的第三部分为这四个子空间提供完美的基时，这便与奇异值分解有所呼应。这三个部分在线性代数课程中是彼此独立但又逐步深入的！第一部分侧重于利用秩来确定子空间的维度。第二部分探讨它们的正交性------即 R n R^n Rn 中的两个子空间和 R m R^m Rm 中的两个子空间之间的关系。第三部分则需要通过 A T A A^TA ATA 的特征值和特征向量来确定最佳基。图 1 将展示线性代数的"全貌"，而图 2 将在其中加入四个基的示意图。

The main purpose of this paper is to see that theorem in action. We choose a matrix of rank one. When m = n = 2 m = n = 2 m=n=2, all four fundamental subspaces are lines in R 2 R^{2} R2. The big picture is particularly clear, and some would say the four lines are trivial. But the angle between x x x and y y y decides the eigenvalues of A A A and its Jordan form---those go beyond the Fundamental Theorem. We are seeing the orthogonal geometry that comes from singular vectors and the skew geometry that comes from eigenvectors. One leads to singular values and the other leads to eigenvalues.

本文的核心目的是验证该定理的实际应用。我们选取一个秩为 1 的矩阵，当 m = n = 2 m = n = 2 m=n=2 时，四个基本子空间均为 R 2 R^{2} R2 中的直线。此时整体框架尤为清晰，有些人可能会认为这四条直线过于简单。但向量 x x x 与 y y y 之间的夹角决定了矩阵 A A A 的特征值及其若尔当（Jordan）标准形，而这些内容已超出了线性代数基本定理的范畴。我们将看到由奇异向量构建的正交几何结构，以及由特征向量构建的非对称几何结构：前者对应奇异值，后者对应特征值。

Examples are amazingly powerful. I hope this family of 2 by 2 matrices fills a space between working with a specific numerical example and an arbitrary matrix.

实例具有极强的说服力。希望这类 2×2 矩阵能填补"具体数值实例"与"任意矩阵"之间的研究空白。

2. The Four Subspaces

2. 四个基本子空间

Figure 1 shows the fundamental subspaces for an m m m by n n n matrix of rank r r r. It is useful to fix ideas with a 3 by 4 matrix of rank 2:

图 1 展示了秩为 r r r 的 m × n m×n m×n 矩阵所对应的四个基本子空间。为便于理解，我们选取一个秩为 2 的 3×4 矩阵作为实例：
A = [ 1 0 2 3 0 1 4 5 0 0 0 0 ] A=\left[\begin{array}{llll} 1 & 0 & 2 & 3 \\ 0 & 1 & 4 & 5 \\ 0 & 0 & 0 & 0\end{array}\right] A= 100010240350

That matrix is in row reduced echelon form and it shows what elimination can accomplish. The column space of A A A and the nullspace of A T A^{T} AT have very simple bases:

该矩阵已化为行最简阶梯形，直观体现了矩阵消元所能达到的效果。矩阵 A A A 的列空间 C ( A ) C(A) C(A) 与 A T A^{T} AT 的零空间 N ( A T ) N(A^{T}) N(AT) 具有非常简单的基：

Jordan chose the best basis ( x x x and y y y) to put x y T xy^{T} xyT in that famous form, with an offdiagonal 1 to signal a missing eigenvector. The SVD will choose two different orthonormal bases to put x y T xy^{T} xyT in its diagonal form Σ \Sigma Σ.

若尔当选择了最优基（ x x x 和 y y y），将矩阵 x y T xy^{T} xyT 化为上述著名的若尔当标准形，其中非对角元 1 标志着矩阵缺少一个线性无关的特征向量。而奇异值分解（SVD）则会选择两组不同的标准正交基，将 x y T xy^{T} xyT 化为对角矩阵 Σ \Sigma Σ。

5. Factorizations of A = x y T A = xy^{T} A=xyT

5. 矩阵 A = x y T A = xy^{T} A=xyT 的分解形式

By bringing together three important ways to factor this matrix, you can see the end result of each approach and how that goal is reached. We still have A = x y T A = xy^{T} A=xyT and rank ( A ) = 1 \text{rank}(A) = 1 rank(A)=1. The end results are Σ \Sigma Σ, Λ \Lambda Λ, and T T T.

通过整合该矩阵的三种重要分解方式，我们可以清晰地看到每种方法的最终结果及其实现过程。此处仍设 A = x y T A = xy^{T} A=xyT 且 rank ( A ) = 1 \text{rank}(A) = 1 rank(A)=1，分解的最终结果分别为对角矩阵 Σ \Sigma Σ、特征值对角矩阵 Λ \Lambda Λ 和上三角矩阵 T T T。

A. Singular Value Decomposition

A. 奇异值分解 （SVD）
U T A V = [ 1 0 0 0 ] = Σ U^{T}AV = \left[\begin{array}{ll}1 & 0 \\ 0 & 0\end{array}\right] = \Sigma UTAV=[1000]=Σ

B. Diagonalization by eigenvectors

B. 特征向量对角化
S − 1 A S = [ cos ⁡ θ 0 0 0 ] = Λ S^{-1}AS = \left[\begin{array}{cc}\cos\theta & 0 \\ 0 & 0\end{array}\right] = \Lambda S−1AS=[cosθ000]=Λ

C. Orthogonal triangularization

C. 正交三角化
Q T A Q = [ cos ⁡ θ sin ⁡ θ 0 0 ] = T Q^{T}AQ = \left[\begin{array}{cc}\cos\theta & \sin\theta \\ 0 & 0\end{array}\right] = T QTAQ=[cosθ0sinθ0]=T

The columns of U U U, V V V, S S S, and Q Q Q will be x x x, y y y, y ⊥ y^{\perp} y⊥, and x ⊥ x^{\perp} x⊥. They come in different orders !

矩阵 U U U、 V V V、 S S S 和 Q Q Q 的列向量均为 x x x、 y y y、 y ⊥ y^{\perp} y⊥ 和 x ⊥ x^{\perp} x⊥，只是排列顺序不同！

A. In the SVD , the columns of U U U and V V V are orthonormal bases for the four subspaces. Figure 3 shows u 1 = x u_{1} = x u1=x in the column space and v 1 = y v_{1} = y v1=y in the row space. Then A y = ( x y T ) y Ay = (xy^{T})y Ay=(xyT)y correctly gives x x x with σ 1 = 1 \sigma_{1} = 1 σ1=1. The nullspace bases are u 2 = x ⊥ u_{2} = x^{\perp} u2=x⊥ and v 2 = y ⊥ v_{2} = y^{\perp} v2=y⊥. Notice the different bases in U U U and V V V, from the reversal of x x x and y y y:

在奇异值分解 中，矩阵 U U U 和 V V V 的列向量分别是四个基本子空间的标准正交基。图 3 显示， u 1 = x u_{1} = x u1=x 属于列空间， v 1 = y v_{1} = y v1=y 属于行空间，且 A y = ( x y T ) y = x Ay = (xy^{T})y = x Ay=(xyT)y=x，对应奇异值 σ 1 = 1 \sigma_{1} = 1 σ1=1，这一结果完全正确。零空间的基分别为 u 2 = x ⊥ u_{2} = x^{\perp} u2=x⊥（ A T A^{T} AT 零空间的基）和 v 2 = y ⊥ v_{2} = y^{\perp} v2=y⊥（ A A A 零空间的基）。需注意，由于 x x x 和 y y y 的"角色反转"，矩阵 U U U 和 V V V 中的基向量有所不同，具体推导如下：
U T A V = [ x x ⊥ ] T [ x y T ] [ y y ⊥ ] = [ x x ⊥ ] T [ x 0 ] = [ 1 0 0 0 ] = Σ U^{T}AV = \left[\begin{array}{ll}x & x^{\perp}\end{array}\right]^{T}\left[\begin{array}{ll}xy^{T}\end{array}\right]\left[\begin{array}{ll}y & y^{\perp}\end{array}\right] = \left[\begin{array}{ll}x & x^{\perp}\end{array}\right]^{T}\left[\begin{array}{ll}x & 0\end{array}\right] = \left[\begin{array}{ll}1 & 0 \\ 0 & 0\end{array}\right] = \Sigma UTAV=[xx⊥]T[xyT][yy⊥]=[xx⊥]T[x0]=[1000]=Σ

The pseudoinverse of x y T xy^{T} xyT is y x T yx^{T} yxT. The norm of A A A is σ 1 = 1 \sigma_{1} = 1 σ1=1.

矩阵 x y T xy^{T} xyT 的伪逆为 y x T yx^{T} yxT，矩阵 A A A 的范数等于其最大奇异值，即 σ 1 = 1 \sigma_{1} = 1 σ1=1。

B. In diagonalization, the eigenvectors of A = x y T A = xy^{T} A=xyT are x x x and y ⊥ y^{\perp} y⊥. Those are the columns of the eigenvector matrix S S S, and its determinant is y T x = cos ⁡ θ y^{T}x = \cos\theta yTx=cosθ. The eigenvectors of A T = y x T A^{T} = yx^{T} AT=yxT are y y y and x ⊥ x^{\perp} x⊥, which go into the rows of S − 1 S^{-1} S−1 (after division by cos ⁡ θ \cos\theta cosθ):

在特征向量对角化 中，矩阵 A = x y T A = xy^{T} A=xyT 的特征向量为 x x x 和 y ⊥ y^{\perp} y⊥，它们构成特征向量矩阵 S S S 的列，且 S S S 的行列式为 y T x = cos ⁡ θ y^{T}x = \cos\theta yTx=cosθ。矩阵 A T = y x T A^{T} = yx^{T} AT=yxT 的特征向量为 y y y 和 x ⊥ x^{\perp} x⊥，这些向量在除以 cos ⁡ θ \cos\theta cosθ 后，构成逆矩阵 S − 1 S^{-1} S−1 的行，具体推导如下：
S − 1 A S = 1 cos ⁡ θ [ y x ⊥ ] T [ x y T ] [ x y ⊥ ] = [ y 0 ] T [ x 0 ] = [ cos ⁡ θ 0 0 0 ] = Λ S^{-1}AS = \frac{1}{\cos\theta}\left[\begin{array}{ll}y & x^{\perp}\end{array}\right]^{T}\left[\begin{array}{ll}xy^{T}\end{array}\right]\left[\begin{array}{ll}x & y^{\perp}\end{array}\right] = \left[\begin{array}{ll}y & 0\end{array}\right]^{T}\left[\begin{array}{ll}x & 0\end{array}\right] = \left[\begin{array}{cc}\cos\theta & 0 \\ 0 & 0\end{array}\right] = \Lambda S−1AS=cosθ1[yx⊥]T[xyT][xy⊥]=[y0]T[x0]=[cosθ000]=Λ

This diagonalization fails when cos ⁡ θ = 0 \cos\theta = 0 cosθ=0 and S S S is singular. The Jordan form jumps from A A A to J J J, as that off-diagonal 1 suddenly appears.

当 cos ⁡ θ = 0 \cos\theta = 0 cosθ=0 时，矩阵 S S S 奇异，此时特征向量对角化失效。随着非对角元 1 的突然出现，矩阵形式将从 A A A 直接"跳跃"到若尔当标准形 J J J。

C .One of the many useful discoveries of Isaac Schur is that every square matrix is unitarily similar to a triangular matrix :

艾萨克·舒尔（Isaac Schur）的众多重要发现之一是：任意方阵都酉相似于一个三角矩阵 ，即：
Q ∗ A Q = T with 其中 Q ∗ Q = I （ Q 为酉矩阵，实数域中即为正交矩阵） . Q^{*}AQ = T \text{ with 其中 } Q^{*}Q = I （Q 为酉矩阵，实数域中即为正交矩阵）. Q∗AQ=T with 其中 Q∗Q=I（Q为酉矩阵，实数域中即为正交矩阵）.

His construction starts with the unit eigenvector x x x in the first column of Q Q Q. In our 2 by 2 case, the construction ends immediately with x ⊥ x^{\perp} x⊥ in the second column:

该构造方法的第一步是将单位特征向量 x x x 作为矩阵 Q Q Q 的第一列。在我们所讨论的 2×2 矩阵情形中，构造过程可直接完成------将 x ⊥ x^{\perp} x⊥ 作为 Q Q Q 的第二列，具体推导如下：

Q T A Q = [ x T x ⊥ T ] [ x y T ] [ x x ⊥ ] = [ y T 0 T ] [ x x ⊥ ] = [ cos ⁡ θ sin ⁡ θ 0 0 ] = T Q^{T}AQ = \left[\begin{array}{l}x^{T} \\ x^{\perp T}\end{array}\right]\left[\begin{array}{l}xy^{T}\end{array}\right]\left[\begin{array}{l}x & x^{\perp}\end{array}\right] = \left[\begin{array}{l}y^{T} \\ 0^{T}\end{array}\right]\left[\begin{array}{ll}x & x^{\perp}\end{array}\right] = \left[\begin{array}{cc}\cos\theta & \sin\theta \\ 0 & 0\end{array}\right] = T QTAQ=[xTx⊥T][xyT][xx⊥]=[yT0T][xx⊥]=[cosθ0sinθ0]=T

This triangular matrix T T T still has norm 1, since Q Q Q is unitary. Numerically T T T is far more stable than the diagonal form Λ \Lambda Λ. In fact T T T survives in the limit cos ⁡ θ = 0 \cos\theta = 0 cosθ=0 of coincident eigenvectors, when it becomes J J J.

由于 Q Q Q 是酉矩阵（实数域中为正交矩阵），三角矩阵 T T T 的范数仍为 1。从数值计算角度看， T T T 远比对角矩阵 Λ \Lambda Λ 稳定。实际上，即使在特征向量重合的极限情况（ cos ⁡ θ = 0 \cos\theta = 0 cosθ=0）下， T T T 依然存在，此时它将退化为若尔当标准形 J J J。

Note : The triangular form T T T is not so widely used, but it gives an elementary proof of a seemingly obvious fact: A random small perturbation of any square matrix is almost sure to produce distinct eigenvalues. What is the best proof?
注：三角矩阵 T T T 的应用并不广泛，但它为一个看似显而易见的事实提供了简洁证明：对任意方阵进行随机小扰动后，几乎必然会得到具有互异特征值的矩阵。那么，最优的证明方法是什么呢？

More controversially, I wonder if Schur can be regarded as the greatest linear algebraist of all time?

更具争议性的一个问题是：舒尔是否能被视为有史以来最伟大的线性代数学家？

Summary

总结

The four fundamental subspaces, coming from A = x y T A = xy^{T} A=xyT and from A T = y x T A^{T} = yx^{T} AT=yxT, are four lines in R 2 R^{2} R2. Their directions are given by x x x, x ⊥ x^{\perp} x⊥, y y y, and y ⊥ y^{\perp} y⊥. The eigenvectors of A A A and A T A^{T} AT are the same four vectors. But there is a crucial crossover in the pictures of Figures 1-2-3. The eigenvectors of A A A lie in its column space and nullspace, not a natural pair. The dimensions of the spaces add to n = 2 n = 2 n=2, but the spaces are not orthogonal and they could even coincide.

由矩阵 A = x y T A = xy^{T} A=xyT 及其转置 A T = y x T A^{T} = yx^{T} AT=yxT 所确定的四个基本子空间，是 R 2 R^{2} R2 中的四条直线，其方向分别由向量 x x x、 x ⊥ x^{\perp} x⊥、 y y y 和 y ⊥ y^{\perp} y⊥ 确定。矩阵 A A A 和 A T A^{T} AT 的特征向量均为这四个向量，但在图 1-2-3 中存在一个关键的"交叉"现象：矩阵 A A A 的特征向量分别属于其列空间和零空间，并非自然配对的子空间。尽管这两个子空间的维数之和为 n = 2 n = 2 n=2，但它们并不正交，甚至可能重合。

The better picture is the orthogonal one that leads to the SVD.

更好的图示是那个正交的，它引出了奇异值分解（SVD）。

References

参考文献

These are among the textbooks that present the four subspaces.

以下是部分介绍四个基本子空间的教材：

David Lay, Linear Algebra and Its Applications, Third edition, Addison-Wesley (2003).

戴维·莱（David Lay），《线性代数及其应用》（第三版），Addison-Wesley 出版社（2003 年）。
Peter Olver and Chehrzad Shakiban, Applied Linear Algebra, Pearson Prentice-Hall (2006).

彼得·奥尔弗（Peter Olver）、彻尔扎德·沙基班（Chehrzad Shakiban），《应用线性代数》，Pearson Prentice-Hall 出版社（2006 年）。
Theodore Shifrin and Malcolm Adams, Linear Algebra: A Geometric Approach, Freeman (2001).

西奥多·希夫林（Theodore Shifrin）、马尔科姆·亚当斯（Malcolm Adams），《线性代数：几何方法》，Freeman 出版社（2001 年）。
Gilbert Strang, Linear Algebra and Its Applications, Fourth edition, Cengage (previously Brooks/Cole) (2006).

吉尔伯特·斯特朗（Gilbert Strang），《线性代数及其应用》（第四版），Cengage 出版社（原 Brooks/Cole 出版社）（2006 年）。
Gilbert Strang, Introduction to Linear Algebra, Third edition, Wellesley-Cambridge Press (2003).

吉尔伯特·斯特朗（Gilbert Strang），《线性代数导论》（第三版），Wellesley-Cambridge 出版社（2003 年）。

via：

The Four Fundamental Subspaces: 4 Lines - Gilbert Strang, Massachusetts Institute of Technology
https://web.mit.edu/18.06/www/Essays/newpaper_ver3.pdf