线性代数 | Graphic Notes on “Linear Algebra for Everyone”

注：本文为 "The Art of Linear Algebra 重排版" 。

英文引文，机翻未校。

如有内容异常，请看原文。

The Art of Linear Algebra

Graphic Notes on "Linear Algebra for Everyone"

Kenji Hiranabe *

with the kindest help of Gilbert Strang †

translator: Kefang Liu ‡

September 1, 2021/updated August 19, 2023

线性代数的艺术

摘要

我尝试为吉尔伯特·斯特朗（Gilbert Strang）在《人人都懂线性代数》（"Linear Algebra for Everyone"）一书中介绍的矩阵重要概念进行可视化图解，以促进从矩阵分解的角度对向量、矩阵计算和算法的理解。这些概念包括矩阵分解（列 - 行分解，Column - Row, CR）、高斯消去法（Gaussian Elimination, LU）、格拉姆 - 施密特正交化（Gram - Schmidt Orthogonalization, QR）、特征值与对角化（Eigenvalues and Diagonalization, Q Λ Q T Q\Lambda Q^T QΛQT）以及奇异值分解（Singular Value Decomposition, U Σ V T U\Sigma V^T UΣVT）。

序言

我很高兴能看到 Kenji Hiranabe 关于线性代数中矩阵运算的图解！这样的图解是展示代数的绝佳方式。我们固然可以通过行 · 列的点乘来理解矩阵乘法，但这绝非全部------矩阵乘法是由"线性组合"与"秩 1 矩阵"共同构成的代数与艺术。我很感激能看到这本书的日文翻译版本以及 Kenji Hiranabe 图解中所蕴含的思想。

------吉尔伯特·斯特朗（Gilbert Strang）

麻省理工学院数学教授

twitter: @hiranabe, k-hiranabe@esm.co.jp, https://anagileway.com
† Massachusetts Institute of Technology, http://www-math.mit.edu/~gs/
‡ twitter: @kfcliu, 微博用户: 5717297833
"Linear Algebra for Everyone": http://math.mit.edu/everyone/

1 理解矩阵------4 个视角

一个矩阵可以被视为 1 个矩阵、 m × n m\times n m×n 个元素、 n n n 个列向量和 m m m 个行向量。

图 1 ：从四个角度理解矩阵

在这里，列向量被标记为粗体 a j a_j aj；行向量则有一个 ∗ * ∗ 号，标记为 a i ∗ a_i^* ai∗；转置向量和矩阵则用 T T T 标记为 a j T a_j^T ajT 和 A T A^T AT。

2 向量乘以向量------2 个视角

在后文中，我将介绍一些概念，同时列出《人人都懂线性代数》（"Linear Algebra for Everyone"）一书中的相应部分（部分编号插入如下）。详细内容建议参考原书，此处我也添加了简短解释，以便您能通过本文尽可能多地理解相关知识。此外，每个图都有一个简短名称，例如 v 1 v1 v1（数字 1 表示向量的乘积）、 M v 1 Mv1 Mv1（数字 1 表示矩阵和向量的乘积），以及如下图（ v 1 v1 v1）所示的彩色圆圈。随着讨论的推进，会对该名称进行交叉引用。

1.1 节（第 2 页）线性组合与点积（Linear combination and dot products）
1.3 节（第 25 页）秩 1 矩阵（Matrix of Rank One）
1.4 节（第 29 页）行视角与列视角（Row way and column way）

图 2 ：向量乘以向量 -（ v 1 v1 v1），（ v 2 v2 v2）

（ v 1 v1 v1）是两个向量之间的基础运算，即向量内积；而（ v 2 v2 v2）是将列向量乘以行向量，从而产生一个秩 1 矩阵。

理解（ v 2 v2 v2）的结果（秩 1 矩阵）是后续章节的关键。

3 矩阵乘以向量------2 个视角

一个矩阵乘以一个向量，会产生由三个点积组成的向量（ M v 1 Mv1 Mv1），同时也是矩阵列向量的一种线性组合（ M v 2 Mv2 Mv2）。

1.1 节（第 3 页）线性组合（Linear combinations）
1.3 节（第 21 页）矩阵与列空间（Matrices and Column Spaces）

图 3 ：矩阵乘以向量 -（ M v 1 Mv1 Mv1），（ M v 2 Mv2 Mv2）

通常，人们会先学习（ M v 1 Mv1 Mv1）这种矩阵乘向量的视角。但当你习惯从（ M v 2 Mv2 Mv2）的视角看待矩阵乘向量时，会理解到 A x Ax Ax 是矩阵 A A A 列向量的线性组合。矩阵 A A A 的列向量的所有线性组合生成的子空间记为 C ( A ) C(A) C(A)（列空间）， A x = 0 Ax = 0 Ax=0 的解空间则是零空间，记为 N ( A ) N(A) N(A)。

同理，由（ v M 1 vM1 vM1）和（ v M 2 vM2 vM2）可知，行向量乘以矩阵也可以用类似的方式理解。

图 4 ：向量乘以矩阵 -（ v M 1 vM1 vM1），（ v M 2 vM2 vM2）

上图中，矩阵 A A A 的行向量的所有线性组合生成的子空间记为 C ( A T ) C(A^T) C(AT)（行空间）， y T A = 0 y^T A = 0 yTA=0 的解空间是 A A A 的左零空间，记为 N ( A T ) N(A^T) N(AT)。

Row Vector Multiplication with Matrix A A A

矩阵 A A A 的行向量乘法

A row vector y y y is multiplied by the two column vectors of A A A and become the two dot-product elements of y A yA yA.

行向量 y y y 与矩阵 A A A 的两个列向量相乘，成为 y A yA yA 的两个点积元素。

A = [ y 1 y 2 y 3 ] [ 1 2 3 4 5 6 ] = [ ( y 1 + 3 y 2 + 5 y 3 ) ( 2 y 1 + 4 y 2 + 6 y 3 ) ] A = \begin{bmatrix} y_1 & y_2 & y_3 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} = \begin{bmatrix} (y_1 + 3y_2 + 5y_3) & (2y_1 + 4y_2 + 6y_3) \end{bmatrix} A=[y1y2y3] 135246 =[(y1+3y2+5y3)(2y1+4y2+6y3)]

行向量 y y y 与矩阵 A A A 相乘，结果是一个新行向量，它由 y y y 与 A A A 的每个列向量的点积形成。
Linear Combination of Row Vectors of Matrix A A A

矩阵 A A A 的行向量的线性组合

The product y A yA yA is a linear combination of the row vectors of A A A.

乘积 y A yA yA 是矩阵 A A A 行向量的线性组合。

y A = [ y 1 y 2 y 3 ] [ 1 2 3 4 5 6 ] = y 1 [ 1 2 ] + y 2 [ 3 4 ] + y 3 [ 5 6 ] yA = \begin{bmatrix} y_1 & y_2 & y_3 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} = y_1 \begin{bmatrix} 1 & 2 \end{bmatrix} + y_2 \begin{bmatrix} 3 & 4 \end{bmatrix} + y_3 \begin{bmatrix} 5 & 6 \end{bmatrix} yA=[y1y2y3] 135246 =y1[12]+y2[34]+y3[56]

行向量 y y y 与矩阵 A A A 的乘积得到的新向量是 A A A 的行向量的线性组合。

本书的一大亮点是介绍了四个基本子空间：在 R n \mathbb{R}^n Rn 上的 C ( A ) C(A) C(A)（列空间）和 N ( A T ) N(A^T) N(AT)（左零空间）（相互正交），以及在 R m \mathbb{R}^m Rm 上的 C ( A T ) C(A^T) C(AT)（行空间）和 N ( A ) N(A) N(A)（零空间）（相互正交）。

3.5 节（第 124 页）四个子空间的维数（Dimensions of the Four Subspaces）

图 5 ：四个子空间

关于秩 r r r，请见（6.1 节） A = C R A = CR A=CR 分解。

4 矩阵乘以矩阵------4 个视角

由"矩阵乘以向量"可自然延伸到"矩阵乘以矩阵"。

1.4 节（第 35 页）矩阵乘法的四种方式（Four ways to multiply）
也可参考原书封底

图 6 ：矩阵乘以矩阵 -（ M M 1 MM1 MM1），（ M M 2 MM2 MM2），（ M M 3 MM3 MM3），（ M M 4 MM4 MM4）

第一种方式（按元素计算）：

Every element becomes a dot product of row vector and column vector.

每个元素都变成行向量和列向量的点积。

1 2 3 4 5 6 \] \[ x 1 y 1 x 2 y 2 \] = \[ ( x 1 + 2 x 2 ) ( y 1 + 2 y 2 ) ( 3 x 1 + 4 x 2 ) ( 3 y 1 + 4 y 2 ) ( 5 x 1 + 6 x 2 ) ( 5 y 1 + 6 y 2 ) \] \\begin{bmatrix}1\&2\\\\3\&4\\\\5\&6\\end{bmatrix}\\begin{bmatrix}x_1\&y_1\\\\x_2\&y_2\\end{bmatrix}=\\begin{bmatrix}(x_1 + 2x_2)\&(y_1 + 2y_2)\\\\(3x_1 + 4x_2)\&(3y_1 + 4y_2)\\\\(5x_1 + 6x_2)\&(5y_1 + 6y_2)\\end{bmatrix} 135246 \[x1x2y1y2\]= (x1+2x2)(3x1+4x2)(5x1+6x2)(y1+2y2)(3y1+4y2)(5y1+6y2) 第二种方式（按列组合）： A x Ax Ax and A y Ay Ay are linear combinations of columns of A A A. A x Ax Ax 和 A y Ay Ay 是矩阵 A A A 列向量的线性组合。 \[ 1 2 3 4 5 6 \] \[ x 1 y 1 x 2 y 2 \] = A \[ x y \] = \[ A x A y \] \\begin{bmatrix}1\&2\\\\3\&4\\\\5\&6\\end{bmatrix}\\begin{bmatrix}x_1\&y_1\\\\x_2\&y_2\\end{bmatrix}=A\\begin{bmatrix}x\&y\\end{bmatrix}=\\begin{bmatrix}Ax\&Ay\\end{bmatrix} 135246 \[x1x2y1y2\]=A\[xy\]=\[AxAy

第三种方式（按行组合）：

The produced rows are linear combinations of rows.

产生的行是行的线性组合。

推导过程 ：从左至右观察矩阵 A A A 的列向量，保留其中线性无关的列向量，去掉可由前面保留的列向量线性表出的列向量。例如，若矩阵 A A A 的第 1、2 列线性无关，第 3 列可由前两列之和表示，则保留第 1、2 列构成 C C C；若要通过 C C C 重新构造出 A A A，则需要右乘一个行阶梯矩阵 R R R。

图 11 ：矩阵 A A A 中列的秩

此时可发现，矩阵 A A A 的列秩为 2，因为 C C C 中仅包含 2 个线性无关的列向量，且 A A A 中所有的列向量都可由 C C C 中的这 2 个列向量线性表出。

图 12 ：矩阵 A A A 中行的秩

同理，矩阵 A A A 的行秩也为 2，因为 R R R 中仅包含 2 个线性无关的行向量，且 A A A 中所有的行向量都可由 R R R 中的这 2 个行向量线性表出。

6.2 A = L U A = LU A=LU（高斯消去分解）

用高斯消除法求解线性方程组 A x = b Ax = b Ax=b 的过程，本质上就是对矩阵 A A A 进行 L U LU LU 分解。通常， L U LU LU 分解是通过对矩阵 A A A 左乘一系列初等行变换矩阵 E E E，将其转化为上三角矩阵 U U U，即 E A = U EA = U EA=U，进而可得 A = E − 1 U A = E^{-1}U A=E−1U，令 L = E − 1 L = E^{-1} L=E−1（ L L L 为下三角矩阵），则 A = L U A = LU A=LU。

E A = U A = E − 1 U let L = E − 1 , A = L U \begin{align*} EA &= U \\ A &= E^{-1}U \\ \text{let } L = E^{-1}, \quad A &= LU \end{align*} EAAlet L=E−1,A=U=E−1U=LU

现在, 求解 A x = b Ax = b Ax=b 有 2 步: (1) 求解 L c = b Lc = b Lc=b, (2) 代回 U x = c Ux = c Ux=c.

2.3 节（第 57 页）矩阵计算与 A = L U A = LU A=LU 分解（Matrix Computations and A = L U A = LU A=LU）

在这里，我们直接通过 A A A 计算 L L L 和 U U U：

图 13 ：矩阵 A A A 的递归秩 1 矩阵分离

要计算 L L L 和 U U U，首先分离出由矩阵 A A A 的第一行 u 1 ∗ u_1^* u1∗ 和第一列 l 1 l_1 l1 组成的外积 l 1 u 1 ∗ l_1 u_1^* l1u1∗，余下的部分记为 A 2 A_2 A2；然后对 A 2 A_2 A2 递归执行此操作，最终将矩阵 A A A 分解为若干个秩 1 矩阵之和，进而得到 L L L 和 U U U。

分离：

每次只拆分当前矩阵的"首列首行"，留下右下角剩余子块。
递归

用同一分离规则拆分剩余部分，重复操作直到剩余子块为零。
步骤

第 1 次分离：由 A A A 的"首列分块 l 1 \boldsymbol{l}_1 l1"和"首行分块 u 1 ∗ \boldsymbol{u}_1^* u1∗"相乘得到的"秩 1 分块矩阵"（ l 1 u 1 ∗ \boldsymbol{l}_1 \boldsymbol{u}_1^* l1u1∗）；
A → l 1 u 1 ∗ + A 2 A \to \boldsymbol{l}_1 \boldsymbol{u}_1^* + A_2 A→l1u1∗+A2；

第 2 次分离：不再处理第一步已拆分的 l 1 u 1 ∗ \boldsymbol{l}_1 \boldsymbol{u}_1^* l1u1∗，而是针对除 l 1 u 1 ∗ \boldsymbol{l}_1 \boldsymbol{u}_1^* l1u1∗的"右下角子块 A 2 A_2 A2"，再次进行"分离"------将 A 2 A_2 A2 拆成 l 2 u 2 ∗ \boldsymbol{l}_2 \boldsymbol{u}_2^* l2u2∗，以及新的子块 A 3 A_3 A3。
A 2 → l 2 u 2 ∗ + A 3 A_2 \to \boldsymbol{l}_2 \boldsymbol{u}_2^* + A_3 A2→l2u2∗+A3，代入第一步得 A → l 1 u 1 ∗ + l 2 u 2 ∗ + A 3 A \to \boldsymbol{l}_1 \boldsymbol{u}_1^* + \boldsymbol{l}_2 \boldsymbol{u}_2^* + A_3 A→l1u1∗+l2u2∗+A3；

......

直到第 r r r 次分离：剩余子块为零矩阵（即 A k + 1 = 0 A_{k+1} = 0 Ak+1=0），当剩余子块不再包含有效信息（秩为 0）时，停止拆分 ，此时所有非零信息已被完全分解到 l k u k ∗ \boldsymbol{l}_k \boldsymbol{u}_k^* lkuk∗ 项中，得到 r r r 个非零分块项 l 1 u 1 ∗ , l 2 u 2 ∗ , ... , l r u r ∗ \boldsymbol{l}_1 \boldsymbol{u}_1^*, \boldsymbol{l}_2 \boldsymbol{u}_2^*, \dots, \boldsymbol{l}_r \boldsymbol{u}r^* l1u1∗,l2u2∗,...,lrur∗，且 r r r 正是矩阵 A A A 的秩（非零子块的最大数量）。最终分解式为：
A = ∑ k = 1 r l k u k ∗ = L U A = \sum{k=1}^r \boldsymbol{l}_k \boldsymbol{u}_k^* = LU A=∑k=1rlkuk∗=LU（ L L L 由 l 1 , l 2 , ... , l r \boldsymbol{l}_1,\boldsymbol{l}_2,...,\boldsymbol{l}_r l1,l2,...,lr 组成， U U U 由 u 1 ∗ , u 2 ∗ , ... , u r ∗ \boldsymbol{u}_1^*,\boldsymbol{u}_2^*,...,\boldsymbol{u}_r^* u1∗,u2∗,...,ur∗ 组成）。

图 14 ：由 L U LU LU 重新构造矩阵 A A A

通过下三角矩阵 L L L 乘以上三角矩阵 U U U 来重新构造矩阵 A A A 相对简单，即 A = L U A = LU A=LU。

6.3 A = Q R A = QR A=QR（正交三角分解）

QR 分解 是在保持矩阵 A A A 列空间不变（即 C ( A ) = C ( Q ) C(A) = C(Q) C(A)=C(Q)）的条件下，将矩阵 A A A 表示为 列正交矩阵 Q Q Q 与 上三角矩阵 R R R 的乘积，即 A = Q R A = QR A=QR。

其中， Q Q Q 的列向量是通过对 A A A 的列向量组进行正交化与单位化得到的标准正交向量组，它们构成 A A A 列空间的一组标准正交基。若 A A A 是可逆方阵，则 Q Q Q 为正交矩阵（满足 Q Q T = Q T Q = I QQ^T = Q^TQ = I QQT=QTQ=I）。

4.4 节正交矩阵与格拉姆 - 施密特正交化（Orthogonal matrices and Gram - Schmidt）（第 165 页）
以 3 维列向量为例，格拉姆 - 施密特正交化通过以下步骤构造正交矩阵 Q Q Q 的列向量：

第一个列向量的单位化

对 A A A 的第一个列向量 a 1 \boldsymbol{a}_1 a1，单位化后得到 q 1 \boldsymbol{q}_1 q1：
q 1 = a 1 ∥ a 1 ∥ \boldsymbol{q}_1 = \frac{\boldsymbol{a}_1}{\|\boldsymbol{a}_1\|} q1=∥a1∥a1

其中 ∥ a 1 ∥ \|\boldsymbol{a}_1\| ∥a1∥ 是 a 1 \boldsymbol{a}_1 a1 的欧几里得范数，即 ∥ a 1 ∥ = a 1 T a 1 \|\boldsymbol{a}_1\| = \sqrt{\boldsymbol{a}_1^T \boldsymbol{a}_1} ∥a1∥=a1Ta1 。
第二个列向量的正交化与单位化

对 A A A 的第二个列向量 a 2 \boldsymbol{a}_2 a2，先减去其在 q 1 \boldsymbol{q}_1 q1 上的投影，得到与 q 1 \boldsymbol{q}_1 q1 正交的临时向量 q 2 temp \boldsymbol{q}_2^{\text{temp}} q2temp；再对 q 2 temp \boldsymbol{q}_2^{\text{temp}} q2temp 单位化，得到 q 2 \boldsymbol{q}_2 q2：
q 2 temp = a 2 − ( q 1 T a 2 ) q 1 , q 2 = q 2 temp ∥ q 2 temp ∥ \boldsymbol{q}_2^{\text{temp}} = \boldsymbol{a}_2 - (\boldsymbol{q}_1^T \boldsymbol{a}_2)\boldsymbol{q}_1, \quad \boldsymbol{q}_2 = \frac{\boldsymbol{q}_2^{\text{temp}}}{\|\boldsymbol{q}_2^{\text{temp}}\|} q2temp=a2−(q1Ta2)q1,q2=∥q2temp∥q2temp
第三个列向量的正交化与单位化

对 A A A 的第三个列向量 a 3 \boldsymbol{a}_3 a3，减去其在 q 1 \boldsymbol{q}_1 q1 和 q 2 \boldsymbol{q}_2 q2 上的投影，得到与 q 1 , q 2 \boldsymbol{q}_1, \boldsymbol{q}_2 q1,q2 正交的临时向量 q 3 temp \boldsymbol{q}_3^{\text{temp}} q3temp；再对 q 3 temp \boldsymbol{q}_3^{\text{temp}} q3temp 单位化，得到 q 3 \boldsymbol{q}_3 q3：
q 3 temp = a 3 − ( q 1 T a 3 ) q 1 − ( q 2 T a 3 ) q 2 , q 3 = q 3 temp ∥ q 3 temp ∥ \boldsymbol{q}_3^{\text{temp}} = \boldsymbol{a}_3 - (\boldsymbol{q}_1^T \boldsymbol{a}_3)\boldsymbol{q}_1 - (\boldsymbol{q}_2^T \boldsymbol{a}_3)\boldsymbol{q}_2, \quad \boldsymbol{q}_3 = \frac{\boldsymbol{q}_3^{\text{temp}}}{\|\boldsymbol{q}_3^{\text{temp}}\|} q3temp=a3−(q1Ta3)q1−(q2Ta3)q2,q3=∥q3temp∥q3temp

从列向量表示推导 A = Q R A = QR A=QR

记 r i j = q i T a j r_{ij} = \boldsymbol{q}_i^T \boldsymbol{a}_j rij=qiTaj（ i , j = 1 , 2 , 3 i,j = 1,2,3 i,j=1,2,3），则矩阵 A A A 的列向量可由 Q Q Q 的列向量线性表出：

第一个列向量： a 1 = r 11 q 1 \boldsymbol{a}1 = r{11}\boldsymbol{q}_1 a1=r11q1；

第二个列向量： a 2 = r 12 q 1 + r 22 q 2 \boldsymbol{a}2 = r{12}\boldsymbol{q}1 + r{22}\boldsymbol{q}_2 a2=r12q1+r22q2；

第三个列向量： a 3 = r 13 q 1 + r 23 q 2 + r 33 q 3 \boldsymbol{a}3 = r{13}\boldsymbol{q}1 + r{23}\boldsymbol{q}2 + r{33}\boldsymbol{q}_3 a3=r13q1+r23q2+r33q3。

将这些线性关系以矩阵形式整合，可得 A = Q R A = QR A=QR。其中：

Q = [ q 1 q 2 q 3 ] Q = \begin{bmatrix} \boldsymbol{q}_1 & \boldsymbol{q}_2 & \boldsymbol{q}_3 \end{bmatrix} Q=[q1q2q3] 为正交矩阵（满足 Q Q T = Q T Q = I QQ^T = Q^TQ = I QQT=QTQ=I）；
R R R 为上三角矩阵，形式为：
R = [ r 11 r 12 r 13 r 22 r 23 r 33 ] R = \begin{bmatrix} r_{11} & r_{12} & r_{13} \\ & r_{22} & r_{23} \\ & & r_{33} \end{bmatrix} R= r11r12r22r13r23r33

因此，原矩阵 A A A 最终表示为"正交矩阵 Q Q Q 与上三角矩阵 R R R 的乘积"，即：
A = Q R = [ ∣ ∣ ∣ q 1 q 2 q 3 ∣ ∣ ∣ ] [ r 11 r 12 r 13 r 22 r 23 r 33 ] A = QR = \begin{bmatrix} \mid & \mid & \mid \\ \boldsymbol{q}1 & \boldsymbol{q}2 & \boldsymbol{q}3 \\ \mid & \mid & \mid \end{bmatrix} \begin{bmatrix} r{11} & r{12} & r{13} \\ & r_{22} & r_{23} \\ & & r_{33} \end{bmatrix} A=QR= ∣q1∣∣q2∣∣q3∣ r11r12r22r13r23r33

其中 R R R 的元素满足： r i j = q i T a j r_{ij} = \boldsymbol{q}_i^T \boldsymbol{a}j rij=qiTaj（ i ≤ j i \leq j i≤j）， r i j = 0 r{ij} = 0 rij=0（ i > j i > j i>j）。

图 15 ： A = Q R A = QR A=QR 分解

通过 Q R QR QR 分解，矩阵 A A A 的列向量被转化为正交矩阵 Q Q Q 的列向量（一个正交集合），且矩阵 A A A 的每一个列向量都可以用 Q Q Q 和上三角矩阵 R R R 重新构造出来，具体可参考前面的 P 1 P1 P1 模式（列运算）。

6.4 S = Q Λ Q T S = Q\Lambda Q^T S=QΛQT（对称矩阵特征值分解）

所有对称矩阵 S S S（满足 S = S T S = S^T S=ST）都具有实特征值和正交的特征向量。其中，特征值 λ i \lambda_i λi 是对角矩阵 Λ \Lambda Λ 的对角元素，特征向量 q i q_i qi 构成正交矩阵 Q Q Q 的列向量。

6.3 节（第 227 页）对称正定矩阵（Symmetric Positive Definite Matrices）

对称矩阵 S S S 的特征值分解表达式为：
S = Q Λ Q T = [ ∣ ∣ ∣ q 1 q 2 q 3 ∣ ∣ ∣ ] [ λ 1 λ 2 λ 3 ] [ − q 1 T − − q 2 T − − q 3 T − ] S=Q\Lambda Q^T=\begin{bmatrix}\mid&\mid&\mid\\q_1&q_2&q_3\\\mid&\mid&\mid\end{bmatrix}\begin{bmatrix}\lambda_1&&\\&\lambda_2&\\&&\lambda_3\end{bmatrix}\begin{bmatrix}-q_1^T -\\-q_2^T -\\-q_3^T -\end{bmatrix} S=QΛQT= ∣q1∣∣q2∣∣q3∣ λ1λ2λ3 −q1T−−q2T−−q3T−

进一步展开为秩 1 投影矩阵的线性组合：

S = λ 1 [ ∣ q 1 ∣ ] [ − q 1 T − ] + λ 2 [ ∣ q 2 ∣ ] [ − q 2 T − ] + λ 3 [ ∣ q 3 ∣ ] [ − q 3 T − ] = λ 1 P 1 + λ 2 P 2 + λ 3 P 3 \begin{aligned} S &= \lambda_1\begin{bmatrix}\mid\\q_1\\\mid\end{bmatrix}\begin{bmatrix}-q_1^T -\end{bmatrix} + \lambda_2\begin{bmatrix}\mid\\q_2\\\mid\end{bmatrix}\begin{bmatrix}-q_2^T -\end{bmatrix} + \lambda_3\begin{bmatrix}\mid\\q_3\\\mid\end{bmatrix}\begin{bmatrix}-q_3^T -\end{bmatrix} \\ &= \lambda_1 P_1 + \lambda_2 P_2 + \lambda_3 P_3 \end{aligned} S=λ1 ∣q1∣ [−q1T−]+λ2 ∣q2∣ [−q2T−]+λ3 ∣q3∣ [−q3T−]=λ1P1+λ2P2+λ3P3

P 1 = q 1 q 1 T , P 2 = q 2 q 2 T , P 3 = q 3 q 3 T P_1 = q_1 q_1^{\text{T}}, \quad P_2 = q_2 q_2^{\text{T}}, \quad P_3 = q_3 q_3^{\text{T}} P1=q1q1T,P2=q2q2T,P3=q3q3T

图 16：对称矩阵特征值分解

对称矩阵的谱定理

一个对称矩阵 S S S 可通过正交矩阵 Q Q Q 及其转置矩阵 Q T Q^T QT 对角化为对角矩阵 Λ \Lambda Λ，并且能进一步分解为一阶投影矩阵 P i = q i q i T P_i = \boldsymbol{q}_i \boldsymbol{q}_i^T Pi=qiqiT（其中 q i \boldsymbol{q}_i qi 是 Q Q Q 的列向量）的线性组合，这一结论即为谱定理（Spectral Theorem）。

分解的性质

此处的分解依赖于特定的性质（记为 P 4 P_4 P4），相关关系与投影矩阵 P i P_i Pi 的性质如下：

对称矩阵的组合表示 ：

对称矩阵 S S S（满足 S = S T S = S^T S=ST）可表示为特征值与对应投影矩阵的线性组合：
S = S T = λ 1 P 1 + λ 2 P 2 + λ 3 P 3 S = S^T = \lambda_1 P_1 + \lambda_2 P_2 + \lambda_3 P_3 S=ST=λ1P1+λ2P2+λ3P3

其中 λ 1 , λ 2 , λ 3 \lambda_1, \lambda_2, \lambda_3 λ1,λ2,λ3 是 S S S 的特征值， P 1 , P 2 , P 3 P_1, P_2, P_3 P1,P2,P3 是对应一阶投影矩阵。
单位矩阵的分解 ：

正交矩阵 Q Q Q 满足列向量的"完备性"，即各投影矩阵之和为单位矩阵 I I I：
Q Q T = P 1 + P 2 + P 3 = I Q Q^T = P_1 + P_2 + P_3 = I QQT=P1+P2+P3=I
投影矩阵的自身性质：
- 对称性与幂等性 ： P i 2 = P i = P i T P_i^2 = P_i = P_i^T Pi2=Pi=PiT（投影矩阵自身平方等于自身，且转置等于自身）；
- 正交性 ：当 i ≠ j i \neq j i=j 时， P i P j = O P_i P_j = O PiPj=O（不同投影矩阵的乘积为零矩阵 O O O）。

6.5 A = U Σ V T A = U\Sigma V^T A=UΣVT（奇异值分解）

7.1 节（第 259 页）奇异值与奇异向量（Singular Values and Singular Vectors）
包括长方阵（ m ≠ n m\neq n m=n）在内的所有矩阵都具有奇异值分解（SVD）。在 A = U Σ V T A = U\Sigma V^T A=UΣVT 中， U U U（ m × m m\times m m×m 矩阵）的列向量是 A A T AA^T AAT 的特征向量（称为左奇异向量）， V V V（ n × n n\times n n×n 矩阵）的列向量是 A T A A^T A ATA 的特征向量（称为右奇异向量）， Σ \Sigma Σ（ m × n m\times n m×n 矩阵）的对角线上的元素是 A T A A^T A ATA（或 A A T AA^T AAT）非零特征值的平方根（称为奇异值），非对角线元素均为 0。

简化版 SVD 特点：

正交性： U U T = I m U U^T = I_m UUT=Im（ I m I_m Im 为 m m m 阶单位矩阵）， V V T = I n V V^T = I_n VVT=In（ I n I_n In 为 n n n 阶单位矩阵），即 U U U 和 V V V 均为正交矩阵。
对角化作用： U T A V = Σ U^T A V = \Sigma UTAV=Σ，即通过 U T U^T UT（左乘）和 V V V（右乘）将矩阵 A A A 对角化为 Σ \Sigma Σ。
秩 1 组合形式： A = U Σ V T = ∑ k = 1 r σ k u k v k T A = U\Sigma V^T=\sum_{k = 1}^r \sigma_k u_k v_k^T A=UΣVT=∑k=1rσkukvkT，其中 r r r 为矩阵 A A A 的秩， σ k \sigma_k σk 为奇异值， u k u_k uk 为左奇异向量， v k v_k vk 为右奇异向量，该式将 A A A 表示为秩 1 矩阵的线性组合。

图 17 ：矩阵分解（SVD） A = U Σ V T A = U\Sigma V^T A=UΣVT

奇异值分解是通过正交矩阵 U U U 和 V V V 以及对角矩阵 Σ \Sigma Σ，将任意矩阵 A A A 分解为具有明确几何意义（如旋转、缩放、投影）的矩阵乘积形式。

可以发现：

V V V 是 R n \mathbb{R}^n Rn 空间中 A T A A^T A ATA 的特征向量构成的标准正交基；
U U U 是 R m \mathbb{R}^m Rm 空间中 A A T A A^T AAT 的特征向量构成的标准正交基。

U U U 和 V V V 共同将矩阵 A A A 对角化为对角矩阵 Σ \Sigma Σ，且 A A A 还可表示为秩 1 矩阵的线性组合，具体推导如下：

A = U Σ V T = [ ∣ ∣ ∣ u 1 u 2 u 3 ∣ ∣ ∣ ] [ σ 1 σ 2 ] [ − v 1 T − − v 2 T − ] = σ 1 [ ∣ u 1 ∣ ] [ − v 1 T − ] + σ 2 [ ∣ u 2 ∣ ] [ − v 2 T − ] = σ 1 u 1 v 1 T + σ 2 u 2 v 2 T \begin{align*} A &= U\Sigma V^T \\ &= \begin{bmatrix} \mid & \mid & \mid \\ \boldsymbol{u}_1 & \boldsymbol{u}_2 & \boldsymbol{u}_3 \\ \mid & \mid & \mid \end{bmatrix} \begin{bmatrix} \sigma_1 & \\ & \sigma_2 \\ & \end{bmatrix} \begin{bmatrix} - \boldsymbol{v}_1^T - \\ - \boldsymbol{v}_2^T - \end{bmatrix} \\ &= \sigma_1 \begin{bmatrix} \mid \\ \boldsymbol{u}_1 \\ \mid \end{bmatrix} \begin{bmatrix} - \boldsymbol{v}_1^T - \end{bmatrix} + \sigma_2 \begin{bmatrix} \mid \\ \boldsymbol{u}_2 \\ \mid \end{bmatrix} \begin{bmatrix} - \boldsymbol{v}_2^T - \end{bmatrix} \\ &= \sigma_1 \boldsymbol{u}_1 \boldsymbol{v}_1^T + \sigma_2 \boldsymbol{u}_2 \boldsymbol{v}_2^T \end{align*} A=UΣVT= ∣u1∣∣u2∣∣u3∣ σ1σ2 [−v1T−−v2T−]=σ1 ∣u1∣ [−v1T−]+σ2 ∣u2∣ [−v2T−]=σ1u1v1T+σ2u2v2T

注意

正交矩阵 U U U 和 V V V 满足：

U U T = I m U U^T = I_m UUT=Im（ I m I_m Im 为 m × m m \times m m×m 单位矩阵）
V V T = I n V V^T = I_n VVT=In（ I n I_n In 为 n × n n \times n n×n 单位矩阵）

奇异值分解的图释细节可参考 P 4 P_4 P4。

总结和致谢

我展示了矩阵/向量乘法的系统可视化表达，以及这些表达在五种矩阵分解中的应用。希望大家能喜欢这些内容，并通过它们加深对线性代数的理解。

阿什利·费尔南德斯（Ashley Fernandes）在排版过程中帮助我美化了这篇论文，使其格式更统一、更专业。

在完成这篇论文之际，我要感谢吉尔伯特·斯特朗（Gilbert Strang）教授出版《人人都懂线性代数》（"Linear Algebra for Everyone"）一书。它引导我们从新的视角探索线性代数中的美妙内容，其中既介绍了当代与传统的数据科学和机器学习相关知识，也让每个人都能通过实用的方式理解线性代数的基本思想------这些都是矩阵世界的重要组成部分。

参考文献与相关工作

Gilbert Strang (2020), Linear Algebra for Everyone, Wellesley Cambridge Press.
Gilbert Strang (2016), Introduction to Linear Algebra, Wellesley Cambridge Press, 5th ed.
Kenji Hiranabe (2021), Map of Eigenvalues, An Agile Way (blog)

图 18 ：特征值图（实 n × n n\times n n×n 方阵的特征值映射）
Kenji Hiranabe (2020), Matrix World, An Agile Way (blog)

图 19 ：矩阵世界（Linear Algebra Matrix World）

（由 Kenji Hiranabe 在 Gilbert Strang 教授的帮助下绘制，版本 v1.4.7，2022 年 10 月 25 日，译者：刘克芳）
New Ideas in "Linear Algebra for Everyone" Gilbert Strangi - compare LAFE-ILA5I

updated February 4, 2024

Essence of linear algebra

线性代数的本质

A geometric understanding of matrices, determinants, eigen-stuffs and more.
矩阵、行列式、特征值与特征向量等概念的几何视角解读

Vectors

向量

Always think vectors in the coordinate space

始终在坐标空间中理解向量
It's a directed line, starting at the origin, with certain length

向量是从原点出发、具有特定长度的有向线段
Addition of vectors

向量的加法
each vector represents a "movement" with steps and direction

每个向量都代表一次包含步长和方向的"移动"
a sum of two vectors can be thought as a two-step movement: first step move with vector 1 and move 2 after the first step move

两个向量的和可理解为两次连续的移动：先沿第一个向量移动，再沿第二个向量移动
Vector multiple by a number ("scalar")

向量与数（"标量"）的乘法
"stretching" each coordinate by the value of scalar

按标量的数值"拉伸"向量的每个坐标分量

Linear combinations, span, and basis vectors

线性组合、张成空间与基向量

Span (of a vector): the line aligned with/goes through the vector
（单个向量的）张成空间：与该向量共线且经过该向量的直线

Linear transformation and matrices

线性变换与矩阵

"Linear" transformation (properties) => grid lines remain parallel and evenly spaced:

"线性"变换（性质）：网格线保持平行且间距均匀，具体表现为：
all lines remain as lines without getting curved

所有直线变换后仍为直线，不会弯曲
the origin remain fixed

原点保持不动
e.g. rotation, stretch

例如：旋转、拉伸
It can be considered as where the base vectors ( i ^ \hat{i} i^ and j ^ \hat{j} j^) land after transformation, then everything else (i.e. all the vectors) in the coordinate follows

线性变换可通过基向量（ i ^ \hat{i} i^ 和 j ^ \hat{j} j^）变换后的落点来确定，坐标空间中的所有其他向量都会遵循基向量的变换规律

if v → = c 1 i ^ + c 2 j ^ \text{if } \overrightarrow{v} = c_1 \hat{i} + c_2\hat{j} if v =c1i^+c2j^

and after transformation i ^ lands on i → and j ^ lands on j → \text{and after transformation } \hat{i} \text{ lands on } \overrightarrow{i} \text{ and } \hat{j} \text{ lands on }\overrightarrow{j} and after transformation i^ lands on i and j^ lands on j
且变换后 i ^ 落在 i → 处， j ^ 落在 j → 处 \text{且变换后 } \hat{i} \text{ 落在 } \overrightarrow{i} \text{ 处，}\hat{j} \text{ 落在 } \overrightarrow{j} \text{ 处} 且变换后 i^ 落在 i 处，j^ 落在 j 处

then transformed v → can be written as v → ′ = c 1 i → + c 2 j → \text{then transformed } \overrightarrow{v} \text{ can be written as } \overrightarrow{v}' = c_1 \overrightarrow{i} + c_2\overrightarrow{j} then transformed v can be written as v ′=c1i +c2j
则变换后的 v → 可表示为 v → ′ = c 1 i → + c 2 j → \text{则变换后的 } \overrightarrow{v} \text{ 可表示为 } \overrightarrow{v}' = c_1 \overrightarrow{i} + c_2\overrightarrow{j} 则变换后的 v 可表示为 v ′=c1i +c2j
Thus the linear transformation of a 2-dimensional vector can be fully described by a 2 × 2 2 \times 2 2×2 matrix defined by the vectors i → \overrightarrow{i} i and j → \overrightarrow{j} j , or [ i → j → ] [\overrightarrow{i} \ \overrightarrow{j}] [i j ].

因此，二维向量的线性变换可由一个 2 × 2 2 \times 2 2×2 矩阵完整描述，该矩阵由变换后的基向量 i → \overrightarrow{i} i 和 j → \overrightarrow{j} j 构成，即 [ i → j → ] [\overrightarrow{i} \ \overrightarrow{j}] [i j ]。
Calculation of linear transformation of a vector as the dot product of a matrix with a vector:

向量的线性变换可通过矩阵与向量的点积计算，公式如下：

a b c d \] \[ e f \] = e ∗ \[ a c \] + f ∗ \[ b d \] = \[ a ∗ e + b ∗ f c ∗ e + d ∗ f \] \\begin{split}\\begin{bmatrix} a \& b \\\\ c \& d \\end{bmatrix} \\begin{bmatrix} e \\\\ f \\end{bmatrix} = e \* \\begin{bmatrix} a\\\\ c \\end{bmatrix} + f \* \\begin{bmatrix} b \\\\ d \\end{bmatrix} = \\begin{bmatrix} a\*e + b\*f \\\\ c\*e + d\*f \\end{bmatrix}\\end{split} \[acbd\]\[ef\]=e∗\[ac\]+f∗\[bd\]=\[a∗e+b∗fc∗e+d∗f

Matrices multiplication can be thought as splitting the second matrix column-wise into individual column vectors, conduct above calculation for each column vector and combine the results column-wise.

矩阵乘法可理解为：将第二个矩阵按列拆分为单个列向量，对每个列向量分别执行上述矩阵与向量的乘法运算，最后将结果按列组合。
Special linear transformation:

特殊的线性变换：
Shear , whose transformation matrix is
剪切变换，其变换矩阵为

1 1 0 1 \] \\begin{split}\\begin{bmatrix} 1 \& 1 \\\\ 0 \& 1\\end{bmatrix} \\end{split} \[1011

Linearly dependent transformation: the resulting transformed space is squashed into a 1-dimensional line

线性相关变换：变换后的空间被压缩成一条一维直线

Matrix multiplication as composition

作为复合变换的矩阵乘法

Multiplication of two matrices ==> two-step transformation (composition) of the space geometrically

两个矩阵相乘，从几何角度看，等价于对空间进行两次连续的线性变换（即变换的复合）
This is basically what I summarized in linear transformation:

这与前文线性变换部分的总结一致，如图所示：

This indicates that the order of the matrices multiplication matters, i.e. change the order of matrices will change the result

这表明矩阵乘法的顺序至关重要，即交换矩阵的相乘顺序会改变结果
To compute the determinant

行列式的计算公式为

det ⁡ ( [ a b c d ] ) = a d − b c \begin{split}\det\left(\begin{bmatrix} a & b \\ c & d\end{bmatrix} \right) = ad - bc\end{split} det([acbd])=ad−bc

Three-dimensional linear transformation

三维线性变换

Very similar to two-dimensional transformation

三维线性变换与二维线性变换非常相似
Instead of move/transform the x − x- x− and y − y- y− coordinates, there is just an additional z − z- z−axis to be transformed

区别在于：二维变换仅涉及 x − x- x− 轴和 y − y- y− 轴的变换，而三维变换还需额外考虑 z − z- z− 轴的变换

The determinant

行列式

Geometric explanation: the "scaling" factor a linear transformation imposes to the area (2D) or volume (3D or above) defined by the basis vectors (since the area defined by the basis vectors is 1, the determinant is actually the signed area of the space defined by the column vectors in the transformation matrix)

几何意义：行列式是线性变换对基向量所围成区域（二维中为面积，三维及以上为体积）的"缩放系数"。由于基向量围成的区域面积/体积为 1，因此行列式本质上等于变换矩阵列向量所围成空间的有向面积/体积。
Determinant can be negative, depending on the orientation of the transformation (negative if orientation is inverted/flipped)

行列式可正可负，符号由变换的定向决定：若变换导致空间定向反转（翻转），则行列式为负。
Determinant of zero meaning the transformation leads to 0 scaling to the area/volume or even to a single point; it also corresponds to linearly dependent transformation

行列式为 0 意味着：变换将空间区域的面积/体积缩放至 0（甚至压缩成一个点），这种情况对应线性相关变换。
Right-hand rule: to check the orientation of the transformation in 3D

右手定则：用于判断三维空间中变换的定向

Inverse matrices, column space and null space

逆矩阵、列空间与零空间

Linear system of equations <=> matrix, vector product => linear transformation of a vector
线性方程组 等价于矩阵与向量的乘积，本质是向量的线性变换
Inverse of a matrix => a back transformation corresponds to the matrix itself
矩阵的逆 是该矩阵所对应线性变换的"逆变换"（即撤销原变换的变换）
The inverse of a matrix exists as long as its determinant is non-zero; in contrast, if the determinant is 0, the inverse doesn't exist, i.e. we can expand a lower dimension space to a higher dimension one. There can be infinite possible higher dimensional space corresponds to the lower space in such case.

当且仅当矩阵的行列式非零时，其逆矩阵存在；反之，若行列式为 0，则逆矩阵不存在。这意味着：我们无法将低维空间"扩展"为高维空间，且一个低维空间可能对应无数个高维空间。
In some special cases, when the determinant is 0, there may still be solutions to some vectors, when the output vectors are those lie along the lower dimension line/space after the transformation (why?? )

在某些特殊情况下，即使行列式为 0，部分向量仍可能存在解：当输出向量恰好落在变换后低维直线/空间上时（原因待探究？）
Rank :
秩：
geometrically, it's the number of dimensions of the resulting space/output after a linear transformation, e.g. if the transformation leads to a line, then the rank of the transformation matrix is 1;

几何意义：线性变换后输出空间的维数。例如，若变换将空间压缩成一条直线，则变换矩阵的秩为 1；
Linear algebra: linearly independent columns of the transformation matrix

线性代数定义：变换矩阵中线性无关列向量的个数
Column space of a matrix: set of all possible outputs of the linear transformation <=> the span of columns of the transformation matrix

矩阵的列空间：线性变换所有可能输出向量的集合，等价于变换矩阵列向量的张成空间
Ranks of a matrix is the number of dimensions in its column space; [ 0 0 ] \begin{bmatrix}0\\0\end{bmatrix} [00] is always in the column space

矩阵的秩等于其列空间的维数，且零向量 [ 0 0 ] \begin{bmatrix}0\\0\end{bmatrix} [00] 始终在列空间中
Full rank : when the rank is as high as it can be;
满秩：矩阵的秩达到其最大可能值；
for a full rank transformation, only the original can lands on itself;

满秩变换中，只有零向量变换后仍为自身；
for a non-full rank transformation, there are infinite numbers of vectors can land on the origin; it is a whole line of vectors in 2D cases or whole plane of vectors get squished into the origin in 3D cases (that the number of dimensions lost in the transformation).

非满秩变换中，有无数个向量变换后会落在原点：二维空间中是一整条直线上的向量，三维空间中是一整个平面上的向量（这些向量对应的维数即为变换中"丢失"的维数）。
Null space : the set of vectors that lands on the origin or the kernel of the matrix
零空间 （又称矩阵的核）：变换后落在原点的所有向量的集合

Nonsquare matrices transformations between dimensions

非方阵：不同维度间的变换

Columns of the matrix stands for the number of basis vectors (in input)

矩阵的列数代表输入空间中基向量的个数（即输入空间的维数）
Rows of the matrix stands for the number of coordinates of each landing spots (in output)

矩阵的行数代表输出空间中每个落点的坐标分量个数（即输出空间的维数）

Dot products and duality

点积与对偶性

Geometrical interpretation

几何意义
Project one vector to the other vector, multiply the projected vector's length with the length of the other vector

将一个向量投影到另一个向量上，再将投影向量的长度与被投影向量的长度相乘
Numerical calculation

数值计算