第十八章番外余弦相似度

余弦相似度（Cosine Similarity）是一种衡量两个非零向量之间角度的度量方式，用于评估它们之间的相似性。它的值范围从 -1 到 1，其中 1 表示完全相同的方向（即向量完全相同），0 表示正交（没有相似性），而 -1 表示完全相反的方向。

假设我们有两个向量 A 和 B，它们的余弦相似度可以通过以下公式计算：

$\\text{similarity} = \\cos(\\theta) = \\frac{\\mathbf{A} \\cdot \\mathbf{B}}{\|\\mathbf{A}\| \|\\mathbf{B}\|}$

其中：

$\\mathbf{A} \\cdot \\mathbf{B}$ 是向量 A 和 B 的点积（内积）。
$\|\\mathbf{A}\| 和和和 \|\\mathbf{B}\|$ 分别是向量 A 和 B 的模长（长度）。

具体来说：

点积（内积）： $\\mathbf{A} \\cdot \\mathbf{B} = \\sum_{i=1}\^{n} A_i B_i$ ，其中 (n) 是向量的维度。
模长（长度）： $\|\\mathbf{A}\| = \\sqrt{\\sum_{i=1}{n} A_i\^2}$ 。

公式可以进一步展开为：

$\\text{similarity} = \\frac{\\sum\\limits_{i=1}\^{n} A_i B_i}{\\sqrt{\\sum\\limits_{i=1}\^{n} A_i\^2} \\sqrt{\\sum\\limits_{i=1}\^{n} B_i\^2}}$

示例计算

假设我们有两个向量 A 和 B，其中：

$\\mathbf{A} = \[1, 2, 3\]$
$\\mathbf{B} = \[4, 5, 6\]$

我们可以按照上述公式计算它们之间的余弦相似度：

点积：
$\\mathbf{A} \\cdot \\mathbf{B} = 14 + 25 + 3\*6 = 4 + 10 + 18 = 32$
模长：
- $\|\\mathbf{A}\| = \\sqrt{12 + 22 + 3\^2} = \\sqrt{1 + 4 + 9} = \\sqrt{14}$
- $\|\\mathbf{B}\| = \\sqrt{42 + 52 + 6\^2} = \\sqrt{16 + 25 + 36} = \\sqrt{77}$
余弦相似度 ：
$\\text{similarity} = \\frac{32}{\\sqrt{14} \\sqrt{77}} = \\frac{32}{\\sqrt{1078}}$

我们可以使用 Python 来计算这个值：

python 复制代码

import numpy as np

# 定义两个向量
vector_a = np.array([1, 2, 3])
vector_b = np.array([4, 5, 6])

# 计算点积
dot_product = np.dot(vector_a, vector_b)

# 计算模长
norm_a = np.linalg.norm(vector_a)
norm_b = np.linalg.norm(vector_b)

# 计算余弦相似度
cosine_similarity = dot_product / (norm_a * norm_b)

print("Cosine similarity:", cosine_similarity)

第十八章 番外 余弦相似度

示例计算

第十八章番外余弦相似度