线性代数 · SVD | 奇异值分解的早期历史（二）

注：本文为 "线性代数 · SVD" 相关英文引文，机翻未校。

如有内容异常，请看原文。

csdn 篇幅字数限制，分为两篇，此为第二篇。

线性代数 · SVD | 奇异值分解的早期历史（一）-CSDN博客
https://blog.csdn.net/u013669912/article/details/151964576

6. Weyl [64, 1912]

6. 外尔的研究 [64, 1912]

An important application of the approximation theorem is the determination of the rank of a matrix in the presence of error. If A A A is of rank k k k and A ~ = A + E \tilde{A} = A + E A~=A+E, then the last n − k n - k n−k singular values of A ~ \tilde{A} A~ satisfy

近似定理的一个重要应用是在存在误差的情况下确定矩阵的秩。若矩阵 A A A 的秩为 k k k，且 A ~ = A + E \tilde{A} = A + E A~=A+E（其中 E E E 为误差矩阵），则 A ~ \tilde{A} A~ 的后 n − k n - k n−k 个奇异值满足
( 6.1 ) σ ~ k + 1 2 + ⋯ + σ ~ n 2 ≤ ∥ E ∥ 2 , (6.1) \quad \tilde{\sigma}{k+1}^{2} + \cdots + \tilde{\sigma}{n}^{2} \leq \| E \|^{2}, (6.1)σ~k+12+⋯+σ~n2≤∥E∥2,

so that the defect in rank of A A A will be manifest in the size of its trailing singular values.

因此，矩阵 A A A 的秩亏损情况可通过其（近似矩阵 A ~ \tilde{A} A~ 的）后几个奇异值的大小体现出来。

The inequality (6.1) is actually a perturbation theorem for the zero singular values of a matrix. Weyl's contribution to the theory of the singular value decomposition was to develop a general perturbation theory and use it to give an elegant proof of the approximation theorem. Although Weyl treated integral equations with symmetric kernels, in a footnote on Schmidt's contribution he states, "E. Schmidt's theorem, by the way, treats arbitrary (unsymmetric) kernels; however, our proof can also be applied directly to this more general case." Since here we are concerned with the more general case, we will paraphrase Weyl's development as he might have written it for unsymmetric matrices.

不等式(6.1) 本质上是矩阵零奇异值的扰动定理。外尔对奇异值分解理论的贡献在于，他建立了一套通用的扰动理论，并利用该理论为近似定理提供了简洁优雅的证明。尽管外尔的研究对象是具有对称核的积分方程，但他在关于施密特贡献的注释中指出："顺便提一句，埃哈德·施密特的定理适用于任意（非对称）核；而我们提出的证明方法同样可直接应用于这一更一般的情形。"由于本文关注的正是这一更一般的情形，下文将借鉴外尔的研究思路，模拟他可能会如何针对非对称矩阵展开推导。

The location of singular values

奇异值的定位

The heart of Weyl's development is a lemma concerning the singular values of a perturbed matrix. Specifically, if B k = X Y T B_k = XY^T Bk=XYT, where X X X and Y Y Y have k k k columns (i.e., rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk)≤k), then

奇异值的位置。Weyl发展的关键在于一个关于扰动矩阵奇异值的引理。具体来说，如果 B k = X Y T B_k = XY^T Bk=XYT，其中 X X X 和 Y Y Y 有 k k k 列（即， rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk)≤k），则

σ 1 ( A − B k ) ≥ σ k + 1 ( A ) , \sigma_1(A - B_k) \geq \sigma_{k+1}(A), σ1(A−Bk)≥σk+1(A),

where σ i ( ⋅ ) \sigma_i(\cdot) σi(⋅) denotes the i i ith singular value of its argument.

其中 σ i ( ⋅ ) \sigma_i(\cdot) σi(⋅) 表示其参数的第 i i i 个奇异值。

The proof is simple. Since Y Y Y has k k k columns, there is a linear combination

证明很简单。由于 Y Y Y 有 k k k 列，存在 V V V 的前 k + 1 k+1 k+1 列（来自 A A A 的奇异值分解）的线性组合

v = γ 1 v 1 + γ 2 v 2 + ⋯ + γ k + 1 v k + 1 v = \gamma_1 v_1 + \gamma_2 v_2 + \cdots + \gamma_{k+1} v_{k+1} v=γ1v1+γ2v2+⋯+γk+1vk+1

of the first k + 1 k+1 k+1 columns of V V V (from the singular value decomposition of A A A) such that Y T v = 0 Y^T v = 0 YTv=0. Without loss of generality we may assume that ∥ v ∥ = 1 \|v\| = 1 ∥v∥=1, or equivalently that γ 1 2 + ⋯ + γ k + 1 2 = 1 \gamma_1^2 + \cdots + \gamma_{k+1}^2 = 1 γ12+⋯+γk+12=1. It follows that

使得 Y T v = 0 Y^T v = 0 YTv=0。不失一般性，我们可以假设 ∥ v ∥ = 1 \|v\| = 1 ∥v∥=1，或者等价地 γ 1 2 + ⋯ + γ k + 1 2 = 1 \gamma_1^2 + \cdots + \gamma_{k+1}^2 = 1 γ12+⋯+γk+12=1。由此可得

σ 1 2 ( A − B ) ≥ v T ( A − B ) T ( A − B ) v = v T A T A v = γ 1 2 σ 1 2 + γ 2 2 σ 2 2 + ⋯ + γ k + 1 2 σ k + 1 2 ≥ σ k + 1 2 . \begin{align*} \sigma _{1}^{2}(A-B) & \ge {{v}^{T}}{{(A-B)}^{T}}(A-B)v \\ & ={{v}^{T}}{{A}^{T}}Av \\ & =\gamma _{1}^{2}\sigma _{1}^{2}+\gamma _{2}^{2}\sigma _{2}^{2}+\cdots +\gamma _{k+1}^{2}\sigma _{k+1}^{2} \\ & \ge \sigma _{k+1}^{2}. \end{align*} σ12(A−B)≥vT(A−B)T(A−B)v=vTATAv=γ12σ12+γ22σ22+⋯+γk+12σk+12≥σk+12.

Weyl then proves two theorems. The first states that if A = A ′ + A ′ ′ A = A' + A'' A=A′+A′′, then

随后，外尔证明了两个定理。第一个定理指出，若 A = A ′ + A ′ ′ A = A' + A'' A=A′+A′′，则
( 6.3 ) σ i + j − 1 ( A ) ≤ σ i ( A ′ ) + σ j ( A ′ ′ ) , (6.3) \quad \sigma_{i+j-1}(A) \leq \sigma_{i}(A') + \sigma_{j}(A''), (6.3)σi+j−1(A)≤σi(A′)+σj(A′′),

where the σ i ( A ′ ) \sigma_{i}(A') σi(A′) and σ i ( A ′ ′ ) \sigma_{i}(A'') σi(A′′) are the singular values of A ′ A' A′ and A ′ ′ A'' A′′ arranged in descending order of magnitude. Weyl begins by establishing (6.3) for i = j = 1 i = j = 1 i=j=1:

其中， σ i ( A ′ ) \sigma_{i}(A') σi(A′) 和 σ i ( A ′ ′ ) \sigma_{i}(A'') σi(A′′) 分别表示矩阵 A ′ A' A′ 和 A ′ ′ A'' A′′ 按从大到小顺序排列的第 i i i 个奇异值。外尔首先证明了 i = j = 1 i = j = 1 i=j=1 时式 (6.3) 成立：
σ 1 ( A ) = u 1 T A v 1 = u 1 T A ′ v 1 + u 1 T A ′ ′ v 1 ≤ σ 1 ( A ′ ) + σ 1 ( A ′ ′ ) . \sigma_{1}(A) = u_{1}^{T} A v_{1} = u_{1}^{T} A' v_{1} + u_{1}^{T} A'' v_{1} \leq \sigma_{1}(A') + \sigma_{1}(A''). σ1(A)=u1TAv1=u1TA′v1+u1TA′′v1≤σ1(A′)+σ1(A′′).

Here, u 1 u_1 u1 and v 1 v_1 v1 are the first columns of the unitary matrices in the singular value decomposition of A A A.

在这里， u 1 u_1 u1 和 v 1 v_1 v1 是矩阵 A A A 奇异值分解中酉矩阵的第一列。

To establish the result in general, let A i − 1 ′ = ∑ m = 1 i − 1 σ m ( A ′ ) u m ′ v m ′ T A_{i-1}' = \sum_{m=1}^{i-1} \sigma_{m}(A') u_m' v_m'^T Ai−1′=∑m=1i−1σm(A′)um′vm′T and A j − 1 ′ ′ = ∑ m = 1 j − 1 σ m ( A ′ ′ ) u m ′ ′ v m ′ ′ T A_{j-1}'' = \sum_{m=1}^{j-1} \sigma_{m}(A'') u_m'' v_m''^T Aj−1′′=∑m=1j−1σm(A′′)um′′vm′′T be formed in analogy with (5.2). Then σ 1 ( A ′ − A i − 1 ′ ) = σ i ( A ′ ) \sigma_{1}(A' - A_{i-1}') = \sigma_{i}(A') σ1(A′−Ai−1′)=σi(A′) and σ 1 ( A ′ ′ − A j − 1 ′ ′ ) = σ j ( A ′ ′ ) \sigma_{1}(A'' - A_{j-1}'') = \sigma_{j}(A'') σ1(A′′−Aj−1′′)=σj(A′′). Moreover, rank ( A i − 1 ′ + A j − 1 ′ ′ ) ≤ ( i − 1 ) + ( j − 1 ) = i + j − 2 (A_{i-1}' + A_{j-1}'') \leq (i-1) + (j-1) = i+j-2 (Ai−1′+Aj−1′′)≤(i−1)+(j−1)=i+j−2. From these facts and from (6.2) it follows that

为证明该定理在一般情况下成立，参照式 (5.2) 构造矩阵： A i − 1 ′ = ∑ m = 1 i − 1 σ m ( A ′ ) u m ′ v m ′ T A_{i-1}' = \sum_{m=1}^{i-1} \sigma_{m}(A') u_m' v_m'^T Ai−1′=∑m=1i−1σm(A′)um′vm′T， A j − 1 ′ ′ = ∑ m = 1 j − 1 σ m ( A ′ ′ ) u m ′ ′ v m ′ ′ T A_{j-1}'' = \sum_{m=1}^{j-1} \sigma_{m}(A'') u_m'' v_m''^T Aj−1′′=∑m=1j−1σm(A′′)um′′vm′′T。则有 σ 1 ( A ′ − A i − 1 ′ ) = σ i ( A ′ ) \sigma_{1}(A' - A_{i-1}') = \sigma_{i}(A') σ1(A′−Ai−1′)=σi(A′)， σ 1 ( A ′ ′ − A j − 1 ′ ′ ) = σ j ( A ′ ′ ) \sigma_{1}(A'' - A_{j-1}'') = \sigma_{j}(A'') σ1(A′′−Aj−1′′)=σj(A′′)，且 rank ( A i − 1 ′ + A j − 1 ′ ′ ) ≤ ( i − 1 ) + ( j − 1 ) = i + j − 2 \text{rank}(A_{i-1}' + A_{j-1}'') \leq (i-1) + (j-1) = i+j-2 rank(Ai−1′+Aj−1′′)≤(i−1)+(j−1)=i+j−2。结合这些结论与式 (6.2) 可推出：
σ i ( A ′ ) + σ j ( A ′ ′ ) = σ 1 ( A ′ − A i − 1 ′ ) + σ 1 ( A ′ ′ − A j − 1 ′ ′ ) ≥ σ 1 ( ( A ′ − A i − 1 ′ ) + ( A ′ ′ − A j − 1 ′ ′ ) ) = σ 1 ( A − ( A i − 1 ′ + A j − 1 ′ ′ ) ) ≥ σ ( i + j − 2 ) + 1 ( A ) = σ i + j − 1 ( A ) , \begin{aligned} \sigma_{i}(A') + \sigma_{j}(A'') &= \sigma_{1}(A' - A_{i-1}') + \sigma_{1}(A'' - A_{j-1}'') \\ &\geq \sigma_{1}\left( (A' - A_{i-1}') + (A'' - A_{j-1}'') \right) \\ &= \sigma_{1}\left( A - (A_{i-1}' + A_{j-1}'') \right) \\ &\geq \sigma_{(i+j-2)+1}(A) = \sigma_{i+j-1}(A), \end{aligned} σi(A′)+σj(A′′)=σ1(A′−Ai−1′)+σ1(A′′−Aj−1′′)≥σ1((A′−Ai−1′)+(A′′−Aj−1′′))=σ1(A−(Ai−1′+Aj−1′′))≥σ(i+j−2)+1(A)=σi+j−1(A),

which proves the theorem.

从而完成了定理的证明。

The second theorem is really a corollary of the first. Set A ′ = A − B k A' = A - B_k A′=A−Bk and A ′ ′ = B k A'' = B_k A′′=Bk, where, as above, B k B_k Bk has rank ≤ k \leq k ≤k. Since σ 1 ( A ′ ′ ) = σ 1 ( B k ) ≤ ∥ B k ∥ \sigma_{1}(A'') = \sigma_{1}(B_k) \leq \| B_k \| σ1(A′′)=σ1(Bk)≤∥Bk∥ and σ k + 1 ( A ′ ′ ) = 0 \sigma_{k+1}(A'') = 0 σk+1(A′′)=0 (because rank ( B k ) ≤ k (B_k) \leq k (Bk)≤k), we have on setting j = k + 1 j = k+1 j=k+1 in (6.3),

第二个定理实际上是第一个定理的推论。令 A ′ = A − B k A' = A - B_k A′=A−Bk、 A ′ ′ = B k A'' = B_k A′′=Bk（其中 B k B_k Bk 的秩满足 rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk)≤k，与前文定义一致）。由于 σ 1 ( A ′ ′ ) = σ 1 ( B k ) ≤ ∥ B k ∥ \sigma_{1}(A'') = \sigma_{1}(B_k) \leq \| B_k \| σ1(A′′)=σ1(Bk)≤∥Bk∥，且 σ k + 1 ( A ′ ′ ) = 0 \sigma_{k+1}(A'') = 0 σk+1(A′′)=0（因 rank ( B k ) ≤ k \text{rank}(B_k) \leq k rank(Bk)≤k），在式 (6.3) 中令 j = k + 1 j = k+1 j=k+1 可得：
σ i ( A − B k ) ≥ σ k + i ( A ) , i = 1 , 2 , ... \sigma_{i}(A - B_k) \geq \sigma_{k+i}(A), \quad i = 1, 2, \dots σi(A−Bk)≥σk+i(A),i=1,2,...

As a corollary to this result we obtain

由该结论可进一步推出推论：
∥ A − B k ∥ 2 ≥ σ k + 1 2 ( A ) + ⋯ + σ n 2 ( A ) . {{\left\| A-{{B}_{k}} \right\|}^{2}}\ge \sigma _{k+1}^{2}(A)+\cdots +\sigma _{n}^{2}(A). ∥A−Bk∥2≥σk+12(A)+⋯+σn2(A).

This inequality is equivalent to (5.3) and thus establishes the approximation theorem.

该不等式与式 (5.3) 等价，由此证明了近似定理。

Discussion

讨论

Weyl did not actually write down the development for unsymmetric kernels, and we remind the reader once again of the advisability of consulting original sources. In particular, since symmetric kernels can have negative eigenvalues as well as positive ones, Weyl wrote down three sequences of inequalities: one for positive eigenvalues, one for negative, and one---corresponding to the inequalities presented here---for the absolute values of the eigenvalues.

需要说明的是，外尔并未实际展开非对称核情形下的推导，因此我们再次建议读者查阅原始文献以获取完整信息。具体而言，由于对称核的特征值既有正值也有负值，外尔在研究中推导了三组不等式：一组针对正特征值，一组针对负特征值，还有一组（与本文呈现的不等式对应）针对特征值的绝对值。

Returning to the perturbation problem that opened this section, if in (6.3) we make the identification A ← A ~ A \leftarrow \tilde{A} A←A~, A ′ ← A A' \leftarrow A A′←A, A ′ ′ ← E A'' \leftarrow E A′′←E, and then set j = 1 j = 1 j=1, we get

回到本节开篇的扰动问题，在式 (6.3) 中令 A ← A ~ A \leftarrow \tilde{A} A←A~、 A ′ ← A A' \leftarrow A A′←A、 A ′ ′ ← E A'' \leftarrow E A′′←E，并取 j = 1 j = 1 j=1，可得：
σ ~ i ≤ σ i + ∥ E ∥ 2 , \tilde{\sigma}{i} \leq \sigma{i} + \| E \|_2, σ~i≤σi+∥E∥2,

where ∥ E ∥ 2 = σ 1 ( E ) \| E \|_2 = \sigma_1(E) ∥E∥2=σ1(E) is the spectral norm of E E E. On the other hand, if we make the identifications A ′ ← A ~ A' \leftarrow \tilde{A} A′←A~ and A ′ ′ ← − E A'' \leftarrow -E A′′←−E, then we get

其中， ∥ E ∥ 2 = σ 1 ( E ) \| E \|2 = \sigma_1(E) ∥E∥2=σ1(E) 表示矩阵 E E E 的谱范数。另一方面，若令 A ′ ← A ~ A' \leftarrow \tilde{A} A′←A~、 A ′ ′ ← − E A'' \leftarrow -E A′′←−E，则可得：
σ i ≤ σ ~ i + ∥ E ∥ 2 . \sigma{i} \leq \tilde{\sigma}_{i} + \| E \|_2. σi≤σ~i+∥E∥2.

It follows that

综合以上两式可得：
∣ σ ~ i − σ i ∣ ≤ ∥ E ∥ 2 , i = 1 , 2 , ... , n . | \tilde{\sigma}{i} - \sigma{i} | \leq \| E \|_2, \quad i = 1, 2, \dots, n. ∣σ~i−σi∣≤∥E∥2,i=1,2,...,n.

The number ∥ E ∥ 2 \| E \|_2 ∥E∥2 is called the spectral norm of E E E. Thus Weyl's result implies that if the singular values of A A A and A ~ \tilde{A} A~ are associated in their natural order, they cannot differ by more than the spectral norm of the perturbation.
∥ E ∥ 2 \| E \|_2 ∥E∥2 被称为矩阵 E E E 的谱范数。因此，外尔的结论表明：若将矩阵 A A A 与 A ~ \tilde{A} A~ 的奇异值按自然顺序（从大到小）对应，則对应奇异值之间的差值不会超过扰动矩阵 E E E 的谱范数。

7. Envoi

7. 结语

With Weyl's contribution, the theory of the singular value decomposition can be said to have matured. The subsequent history is one of extensions, new discoveries, and applications. What follows is a brief, selective sketch of these developments yet to come.

随着外尔研究成果的出现，奇异值分解理论可被认为已趋于成熟。此后的研究主要围绕理论拓展、新发现与实际应用展开。下文将有选择地简要介绍这些后续发展。

Extensions

理论拓展

Autonne [2, 1913] extended the decomposition to complex matrices. Eckart and Young [16, 1936], [17, 1939] extended it to rectangular matrices and rediscovered Schmidt's approximation theorem, which is often (and incorrectly) called the Eckart-Young theorem.

奥托恩（Autonne）在 1913 年的文献 [2] 中将奇异值分解推广到复矩阵情形。埃卡特（Eckart）与杨（Young）在 1936 年的文献 [16] 和 1939 年的文献 [17] 中，将其推广到长方矩阵情形，并重新发现了施密特的近似定理------该定理常被（错误地）称为"埃卡特-杨定理"。

8. Nomenclature 7 ^7 7

8. 术语命名

The term "singular value" seems to have come from the literature on integral equations. A little after the appearance of Schmidt's paper, Bateman [4, 1908] refers to numbers that are essentially the reciprocals of the eigenvalues of the kernel A ‾ ( s , t ) \underline{A}(s,t) A(s,t) as singular values. Picard [45, 1909] combined Schmidt's results with Riesz's theorem on the strong convergence of generalized Fourier series [48, 1907] to establish a necessary and sufficient condition for the existence of solutions of integral equations. In a later paper on the same subject [46, 1910], he notes that for symmetric kernels Schmidt's eigenvalues are real and in this case (but not in general) he calls them singular values. By 1937, Smithies [53] was referring to singular values of an integral equation in our modern sense of the word. Even at this point, usage had not stabilized. In 1949, Weyl [65] speaks of the "two kinds of eigenvalues of a linear transformation," and in a 1969 translation of a 1965 Russian treatise on nonselfadjoint operators, Gohberg and Krein [21] refer to the "s-numbers" of an operator. For the term "principal component," see below.

"奇异值"（singular value）这一术语的起源似乎与积分方程领域的文献相关。在施密特论文发表后不久，贝特曼（Bateman）在 1908 年的文献 [4] 中，将核 A ‾ ( s , t ) \underline{A}(s,t) A(s,t) 特征值的倒数（本质上）称为"奇异值"。皮卡德（Picard）在 1909 年的文献 [45] 中，将施密特的研究成果与里斯（Riesz）关于广义傅里叶级数强收敛的定理（1907 年文献 [48]）相结合，建立了积分方程解存在的充要条件。在后续一篇关于同一主题的论文（1910 年文献 [46]）中，他指出：对于对称核，施密特定义的特征值为实值，且仅在这种情形下（而非一般情形），他将其称为"奇异值"。到 1937 年，史密斯（Smithies）在文献 [53] 中使用的"积分方程奇异值"一词，已与我们现在对"奇异值"的定义一致。即便如此，该术语的使用仍未完全统一：1949 年，外尔在文献 [65] 中仍将其称为"线性变换的两类特征值"；在 1969 年翻译的一本 1965 年苏联关于非自伴算子的专著中，戈德堡（Gohberg）与克赖因（Krein）在文献 [21] 中将其称为算子的"s-数"（s-numbers）。关于"主成分"（principal component）这一术语的由来，详见下文。

7 ^7 7Parts of this passage were taken from [55, p. 35]

⁷本文的部分内容取自 [55, 第35页]

Unitarily invariant norms

酉不变范数

A matrix norm ∥ ⋅ ∥ u \| \cdot \|_u ∥⋅∥u is unitarily invariant if ∥ U A V ∥ u = ∥ A ∥ u \| U A V \|_u = \| A \|_u ∥UAV∥u=∥A∥u for all unitary matrices U U U and V V V. A vector norm ∥ ⋅ ∥ g \| \cdot \|_g ∥⋅∥g is a symmetric gauge function if ∥ P x ∥ g = ∥ x ∥ g \| P x \|_g = \| x \|_g ∥Px∥g=∥x∥g for any permutation matrix P P P and ∥ ∣ x ∣ ∥ g = ∥ x ∥ g \| |x| \|_g = \| x \|_g ∥∣x∣∥g=∥x∥g (where ∣ x ∣ |x| ∣x∣ denotes the vector of absolute values of the components of x x x). Von Neumann [61, 1937] showed that to any unitarily invariant norm ∥ ⋅ ∥ u \| \cdot \|_u ∥⋅∥u there corresponds a symmetric gauge function ∥ ⋅ ∥ g \| \cdot \|_g ∥⋅∥g such that ∥ A ∥ u = ∥ ( σ 1 , ... , σ n ) T ∥ g \| A \|_u = \| (\sigma_1, \dots, \sigma_n)^T \|_g ∥A∥u=∥(σ1,...,σn)T∥g; i.e., a unitarily invariant norm is a symmetric gauge function of the singular values of its argument.

若对任意酉矩阵 U U U 和 V V V，均有 ∥ U A V ∥ u = ∥ A ∥ u \| U A V \|_u = \| A \|_u ∥UAV∥u=∥A∥u，则称矩阵范数 ∥ ⋅ ∥ u \| \cdot \|_u ∥⋅∥u 为酉不变范数。若对任意置换矩阵 P P P 和向量 x x x，均有 ∥ P x ∥ g = ∥ x ∥ g \| P x \|_g = \| x \|_g ∥Px∥g=∥x∥g，且 ∥ ∣ x ∣ ∥ g = ∥ x ∥ g \| |x| \|_g = \| x \|_g ∥∣x∣∥g=∥x∥g（其中 ∣ x ∣ |x| ∣x∣ 表示由 x x x 各分量绝对值构成的向量），则称向量范数 ∥ ⋅ ∥ g \| \cdot \|_g ∥⋅∥g 为对称规范函数。冯·诺依曼（Von Neumann）在 1937 年的文献 [61] 中证明：对任意酉不变范数 ∥ ⋅ ∥ u \| \cdot \|_u ∥⋅∥u，均存在对应的对称规范函数 ∥ ⋅ ∥ g \| \cdot \|_g ∥⋅∥g，使得 ∥ A ∥ u = ∥ ( σ 1 , ... , σ n ) T ∥ g \| A \|_u = \| (\sigma_1, \dots, \sigma_n)^T \|_g ∥A∥u=∥(σ1,...,σn)T∥g；也就是说，酉不变范数可表示为其作用矩阵奇异值的对称规范函数。

Approximation theorems

近似定理

Schmidt's approximation theorem has been generalized in a number of directions. Mirsky [40, 1960] showed that A k A_k Ak of (5.2) is a minimizing matrix in any unitarily invariant norm. The case where further restrictions are imposed on the minimizing matrix are treated in [12], [22], and [47].

施密特的近似定理已在多个方向上得到推广。米尔斯基（Mirsky）在 1960 年的文献 [40] 中证明：式 (5.2) 定义的 A k A_k Ak 在任意酉不变范数下均为最优近似矩阵。关于对最优近似矩阵施加额外约束的情形，可参见文献 [12]、[22] 和 [47]。

Given matrices A A A and B B B, the Procrustes problem, which arises in the statistical method of factor analysis, is that of determining a unitary matrix Q Q Q such that ∥ A − B Q ∥ \|A - BQ\| ∥A−BQ∥ is minimized (see [29, 1962]). Green [25, 1952] and Schoneman [51, 1966] showed that if U T A T B V = Σ U^T A^T B V = \Sigma UTATBV=Σ is the singular value decomposition of A T B A^T B ATB then the minimizing matrix is Q = V U T Q = V U^T Q=VUT. Rao [47, 1980] considers the more general problem of minimizing ∥ P A − B Q ∥ \|P A - B Q\| ∥PA−BQ∥, where P P P and Q Q Q are orthogonal.

给定矩阵 A A A 和 B B B，普罗克拉斯提斯（Procrustes）问题源于因子分析这一统计方法，该问题旨在确定一个酉矩阵 Q Q Q，使得 ∥ A − B Q ∥ \|A - BQ\| ∥A−BQ∥ 达到最小（参见文献 [29, 1962]）。Green [25, 1952] 和 Schoneman [51, 1966] 证明：若 U T A T B V = Σ U^T A^T B V = \Sigma UTATBV=Σ 是 A T B A^T B ATB 的奇异值分解，则使该范数最小的矩阵为 Q = V U T Q = V U^T Q=VUT。Rao [47, 1980] 则研究了更一般的问题，即最小化 ∥ P A − B Q ∥ \|P A - B Q\| ∥PA−BQ∥，其中 P P P 和 Q Q Q 均为正交矩阵。

Principal components.

主成分

An alternative to factor analysis is the principal component analysis of Hotelling [27, 1933]. Specifically, if x T x^T xT is a random variable with mean zero and common dispersion matrix D D D, and D = V Σ V T D = V \Sigma V^T D=VΣVT is the eigenvalue-eigenvector decomposition of D D D, then the components of x T V x^T V xTV are uncorrelated with variances σ i \sigma_i σi. Hotelling called the transformed variables "the principal components of variance" of x T x^T xT. If the rows of X X X consist of independent copies of x T x^T xT, then the expectation of X T X X^T X XTX is proportional to Σ \Sigma Σ. It follows that the matrix V ^ \hat{V} V^ obtained from the singular value decomposition of X X X is an estimate of V V V.

主成分。因子分析的一种替代方法是 Hotelling [27, 1933] 提出的主成分分析。具体而言，若 x T x^T xT 是一个均值为零、公共散布矩阵为 D D D 的随机变量，且 D = V Σ V T D = V \Sigma V^T D=VΣVT 是 D D D 的特征值 - 特征向量分解，则 x T V x^T V xTV 的各分量互不相关，其方差为 σ i \sigma_i σi。Hotelling 将这些变换后的变量称为 x T x^T xT 的"方差主成分"。若矩阵 X X X 的各行是 x T x^T xT 的独立样本，则 X T X X^T X XTX 的期望与 Σ \Sigma Σ 成比例。由此可推出，通过 X X X 的奇异值分解得到的矩阵 V ^ \hat{V} V^ 是 V V V 的一个估计。

Hotelling [28, 1936] also introduced canonical correlations between two sets of random variables that bear the same relation to the generalized singular value decomposition as his principal components bear to the singular value decomposition.

Hotelling [28, 1936] 还提出了两组随机变量之间的典型相关（canonical correlation）。这种典型相关与广义奇异值分解的关系，等同于其主成分与奇异值分解的关系。

Inequalities involving singular values. Just as Schmidt did not have the last word on approximation theorems, Weyl was not the last to work on inequalities involving singular values. The subject is too voluminous to treat here, and we refer the reader to the excellent survey with references in [26, Chap. 3]. However, mention should be made of a line of research initiated by Weyl [65, 1949] relating the singular values and eigenvalues of a matrix.

涉及奇异值的不等式。正如 Schmidt 并非在逼近定理方面做出最终定论的学者，Weyl 也不是最后一位研究涉及奇异值不等式的学者。该主题内容过于庞杂，无法在此详尽阐述，建议读者参考 [26, 第 3 章] 中包含参考文献的出色综述。不过，值得一提的是 Weyl [65, 1949] 开创的一个研究方向，该方向探讨了矩阵奇异值与特征值之间的关系。

Computational methods. The singular value decomposition was introduced into numerical analysis by Golub and Kahan [23, 1965], who proposed a computational algorithm. However, it was Golub [24, 1970] who gave the algorithm that has been the workhorse of the past two decades. Recently, Demmel and Kahan [13, 1990] have proposed an interesting alternative.

计算方法。Golub 和 Kahan [23, 1965] 将奇异值分解引入数值分析领域，并提出了相应的计算算法。然而，过去二十年里广泛应用的算法是由 Golub [24, 1970] 提出的。最近，Demmel 和 Kahan [13, 1990] 提出了一种颇具新意的替代算法。

Sources. For short bibliographies of the principles see the Dictionary of Scientific Biography [6], and particularly the articles [6], [14], [15], [42], and [56]. The nearest thing to a systematic survey of the development of matrix decompositions is the chapter on determinants and matrices in Kline's Mathematical Thought from Ancient to Modern Times [35, Chap. 33]. Mac Duffee's book, The Theory of Matrices [39], is a gold mine of references to the older literature.

资料来源。关于相关原理的简要参考文献，可参见《科学传记词典》（Dictionary of Scientific Biography）[6]，尤其可参考其中的文献 [6]、[14]、[15]、[42] 和 [56]。对矩阵分解发展历程最为系统的综述类文献，当属 Kline 所著《古今数学思想》（Mathematical Thought from Ancient to Modern Times）中关于行列式与矩阵的章节 [35, 第 33 章]。Mac Duffee 的著作《矩阵理论》（The Theory of Matrices）[39] 则是收录早期相关文献的宝库。

Acknowledgments. I would like to thank Anne Greenbaum, Nick Higham, David Wood, and Hongyuan Zha for reading and commenting on the manuscript.

致谢。感谢 Anne Greenbaum、Nick Higham、David Wood 以及 Hongyuan Zha（查宏远）阅读本文手稿并提出宝贵意见。

线性代数 · SVD | 奇异值分解的早期历史（二）

6. Weyl [64, 1912]

6. 外尔的研究 [64, 1912]

The location of singular values

奇异值的定位

Discussion

讨论

7. Envoi

7. 结语

Extensions

理论拓展

8. Nomenclature 7 ^7 7

8. 术语命名

相关分解

Unitarily invariant norms

酉不变范数

Approximation theorems

近似定理

Principal components.

主成分

REFERENCES

参考文献

线性代数 · SVD | 奇异值分解的早期历史（二）

6. Weyl [64, 1912]

6. 外尔的研究 [64, 1912]

The location of singular values

奇异值的定位

Discussion

讨论

7. Envoi

7. 结语

Extensions

理论拓展

8. Nomenclature 7 ^7 7

8. 术语命名

Related decompositions

相关分解

Unitarily invariant norms

酉不变范数

Approximation theorems

近似定理

Principal components.

主成分

REFERENCES

参考文献