简述:
- 余弦相似度是在余弦定理的基础上引入向量表示后得到的。
以上面 余弦定理 的三角形为例,假设我们有两个向量:
- 向量 b\mathbf{b}b,表示原点 A 指向点 C
- 向量 c\mathbf{c}c,表示原点 A 指向点 B
text
A
/ \
/ \
b / \ c
/ \
B_________C
a
下面我们开始从余弦定理开始推导余弦相似度的公式:
为了表达更清晰,把两个向量写成二维坐标形式:
b=(x1,y1),c=(x2,y2)\mathbf{b} = (x_1, y_1), \quad \mathbf{c} = (x_2, y_2)b=(x1,y1),c=(x2,y2)
则它们的长度分别为:
∣b∣=x12+y12,∣c∣=x22+y22|\mathbf{b}| = \sqrt{x_1^2 + y_1^2}, \quad |\mathbf{c}| = \sqrt{x_2^2 + y_2^2}∣b∣=x12+y12 ,∣c∣=x22+y22
两向量终点之间的边长(图中的 aaa)可写为:
a2=(x1−x2)2+(y1−y2)2a^2 = (x_1 - x_2)^2 + (y_1 - y_2)^2a2=(x1−x2)2+(y1−y2)2
代入余弦定理:
cosθ=∣b∣2+∣c∣2−a22∣b∣∣c∣ \cos\theta = \frac{|\mathbf{b}|^2 + |\mathbf{c}|^2 - a^2}{2|\mathbf{b}||\mathbf{c}|} cosθ=2∣b∣∣c∣∣b∣2+∣c∣2−a2
cosθ=(x12+y12×x12+y12)+(x22+y22×x22+y22)−[(x1−x2)2+(y1−y2)2]2×x12+y12×x22+y22 \cos\theta = \frac{(\sqrt{x_1^2 + y_1^2} \times \sqrt{x_1^2 + y_1^2}) + (\sqrt{x_2^2 + y_2^2} \times \sqrt{x_2^2 + y_2^2}) - [(x_1 - x_2)^2 + (y_1 - y_2)^2]}{2 \times \sqrt{x_1^2 + y_1^2} \times \sqrt{x_2^2 + y_2^2}} cosθ=2×x12+y12 ×x22+y22 (x12+y12 ×x12+y12 )+(x22+y22 ×x22+y22 )−[(x1−x2)2+(y1−y2)2]
cosθ=(x12+y12)+(x22+y22)−[(x1−x2)2+(y1−y2)2]2×x12+y12×x22+y22 \cos\theta = \frac{(x_1^2 + y_1^2) + (x_2^2 + y_2^2) - [(x_1 - x_2)^2 + (y_1 - y_2)^2]}{2 \times \sqrt{x_1^2 + y_1^2} \times \sqrt{x_2^2 + y_2^2}} cosθ=2×x12+y12 ×x22+y22 (x12+y12)+(x22+y22)−[(x1−x2)2+(y1−y2)2]
平方公式拆解:
(x1−x2)2+(y1−y2)2=(x12−2x1x2+x22)+(y12−2y1y2+y22)(x_1 - x_2)^2 + (y_1 - y_2)^2 = (x_1^2 - 2x_1x_2 + x_2^2) + (y_1^2 - 2y_1y_2 + y_2^2)(x1−x2)2+(y1−y2)2=(x12−2x1x2+x22)+(y12−2y1y2+y22)
所以,展开后可得:
cosθ=(x12+y12)+(x22+y22)−[x12−2x1x2+x22+y12−2y1y2+y22]2×x12+y12×x22+y22 \cos\theta = \frac{(x_1^2 + y_1^2) + (x_2^2 + y_2^2) - [x_1^2 - 2x_1x_2 + x_2^2 + y_1^2 - 2y_1y_2 + y_2^2]}{2 \times \sqrt{x_1^2 + y_1^2} \times \sqrt{x_2^2 + y_2^2}} cosθ=2×x12+y12 ×x22+y22 (x12+y12)+(x22+y22)−[x12−2x1x2+x22+y12−2y1y2+y22]
去括号并化简
cosθ=(x12+y12)+(x22+y22)−x12+2x1x2−x22−y12+2y1y2−y222×x12+y12×x22+y22 \cos\theta = \frac{(x_1^2 + y_1^2) + (x_2^2 + y_2^2) - x_1^2 + 2x_1x_2 - x_2^2 - y_1^2 + 2y_1y_2 - y_2^2}{2 \times \sqrt{x_1^2 + y_1^2} \times \sqrt{x_2^2 + y_2^2}} cosθ=2×x12+y12 ×x22+y22 (x12+y12)+(x22+y22)−x12+2x1x2−x22−y12+2y1y2−y22
cosθ=2x1x2+2y1y22×x12+y12×x22+y22 \cos\theta = \frac{2x_1x_2 + 2y_1y_2}{2 \times \sqrt{x_1^2 + y_1^2} \times \sqrt{x_2^2 + y_2^2}} cosθ=2×x12+y12 ×x22+y22 2x1x2+2y1y2
cosθ=2(x1x2+y1y2)2×x12+y12×x22+y22 \cos\theta = \frac{2(x_1x_2 + y_1y_2)}{2 \times \sqrt{x_1^2 + y_1^2} \times \sqrt{x_2^2 + y_2^2}} cosθ=2×x12+y12 ×x22+y22 2(x1x2+y1y2)
cosθ=x1x2+y1y2x12+y12×x22+y22 \cos\theta = \frac{x_1x_2 + y_1y_2}{\sqrt{x_1^2 + y_1^2} \times \sqrt{x_2^2 + y_2^2}} cosθ=x12+y12 ×x22+y22 x1x2+y1y2
cosθ=b⋅c∣b∣⋅∣c∣ \cos\theta = \frac{b \cdot c}{|\mathbf{b}| \cdot |\mathbf{c}|} cosθ=∣b∣⋅∣c∣b⋅c
这就是常用的余弦相似度公式(二维形式)。
相关知识补充:
完全平方公式:
(a+b)2=a2+2ab+b2(a + b)^2 = a^2 + 2ab + b^2(a+b)2=a2+2ab+b2
(a−b)2=a2−2ab+b2(a - b)^2 = a^2 - 2ab + b^2(a−b)2=a2−2ab+b2