机器学习 矩阵求导 完整公式+严谨推导

机器学习 矩阵求导 完整公式+严谨推导

一、核心符号和默认约定

设:

  • 标量小写 : x,y,a,bx,y,a,bx,y,a,b

  • 向量加粗 : x∈Rn×1,y∈Rm×1\boldsymbol x\in\mathbb R^{n\times 1},\boldsymbol y\in\mathbb R^{m\times 1}x∈Rn×1,y∈Rm×1 (默认:列向量

  • 矩阵大写 : A∈Rm×n,B,C\boldsymbol A\in\mathbb R^{m\times n},\boldsymbol B,\boldsymbol CA∈Rm×n,B,C

  • 布局:统一用分子布局(主流机器学习/教材)

    ∂标量∂x:行向量,∂y∂标量:列向量\frac{\partial \text{标量}}{\partial \boldsymbol x}:行向量,\quad \frac{\partial \boldsymbol y}{\partial \text{标量}}:列向量∂x∂标量:行向量,∂标量∂y:列向量


二、基础求导定义

1. 标量对列向量求导-默认行向量

x=[x1x2⋮xn],∂f∂x=[∂f∂x1∂f∂x2⋯∂f∂xn]\boldsymbol x = \begin{bmatrix}x_1\\x_2\\\vdots\\x_n\end{bmatrix},\quad \frac{\partial f}{\partial \boldsymbol x} = \begin{bmatrix}\dfrac{\partial f}{\partial x_1}& \dfrac{\partial f}{\partial x_2}&\cdots& \dfrac{\partial f}{\partial x_n}\end{bmatrix}x= x1x2⋮xn ,∂x∂f=[∂x1∂f∂x2∂f⋯∂xn∂f]

2. 列向量对标量求导-默认列向量

∂y∂x=[∂y1∂x∂y2∂x⋮∂ym∂x]\frac{\partial \boldsymbol y}{\partial x} = \begin{bmatrix}\dfrac{\partial y_1}{\partial x}\\[4pt] \dfrac{\partial y_2}{\partial x}\\[4pt] \vdots\\[4pt] \dfrac{\partial y_m}{\partial x}\end{bmatrix}∂x∂y= ∂x∂y1∂x∂y2⋮∂x∂ym

3. 标量对矩阵求导

X=(xij),∂f∂X=[∂f∂x11⋯∂f∂x1n⋮⋱⋮∂f∂xm1⋯∂f∂xmn]\boldsymbol X = (x_{ij}),\quad \frac{\partial f}{\partial \boldsymbol X} = \begin{bmatrix} \dfrac{\partial f}{\partial x_{11}} & \cdots & \dfrac{\partial f}{\partial x_{1n}}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial f}{\partial x_{m1}} & \cdots & \dfrac{\partial f}{\partial x_{mn}} \end{bmatrix}X=(xij),∂X∂f= ∂x11∂f⋮∂xm1∂f⋯⋱⋯∂x1n∂f⋮∂xmn∂f


三、向量求导

1. 线性型: Ax\boldsymbol A\boldsymbol xAx

设 A\boldsymbol AA 与 x\boldsymbol xx 无关

∂(Ax)∂x=A\frac{\partial (\boldsymbol A\boldsymbol x)}{\partial \boldsymbol x} = \boldsymbol A∂x∂(Ax)=A

推导

(Ax)i=∑kAikxk⇒∂(Ax)i∂xj=Aij⇒∂Ax∂x=A(\boldsymbol A\boldsymbol x)i = \sum{k}A_{ik}x_k \Rightarrow \frac{\partial (\boldsymbol A\boldsymbol x)i}{\partial x_j} = A{ij} \Rightarrow \frac{\partial \boldsymbol A\boldsymbol x}{\partial \boldsymbol x}=\boldsymbol A(Ax)i=∑kAikxk⇒∂xj∂(Ax)i=Aij⇒∂x∂Ax=A


2. 双线性型: aTx,xTa\boldsymbol a^\mathrm T \boldsymbol x,\boldsymbol x^\mathrm T \boldsymbol aaTx,xTa

∂aTx∂x=aT,∂xTa∂x=aT\frac{\partial \boldsymbol a^\mathrm T \boldsymbol x}{\partial \boldsymbol x} = \boldsymbol a^\mathrm T,\quad \frac{\partial \boldsymbol x^\mathrm T \boldsymbol a}{\partial \boldsymbol x} = \boldsymbol a^\mathrm T∂x∂aTx=aT,∂x∂xTa=aT

推导

aTx=∑kakxk⇒∂(aTx)∂xj=aj\boldsymbol a^\mathrm T\boldsymbol x = \sum_k a_k x_k \Rightarrow \frac{\partial(\boldsymbol a^\mathrm T\boldsymbol x)}{\partial x_j}=a_jaTx=∑kakxk⇒∂xj∂(aTx)=aj

按分子布局,结果为行向量 aT\boldsymbol a^\mathrm TaT 。


3. 二次型(最常用): xTAx\boldsymbol x^\mathrm T \boldsymbol A \boldsymbol xxTAx

A 对称:∂xTAx∂x=2xTA\boldsymbol A\text{ 对称:}\frac{\partial \boldsymbol x^\mathrm T A \boldsymbol x}{\partial \boldsymbol x} = 2\boldsymbol x^\mathrm T \boldsymbol AA 对称:∂x∂xTAx=2xTA

A 任意:∂xTAx∂x=xT(A+AT)\boldsymbol A\text{ 任意:}\frac{\partial \boldsymbol x^\mathrm T A \boldsymbol x}{\partial \boldsymbol x} = \boldsymbol x^\mathrm T(\boldsymbol A+\boldsymbol A^\mathrm T)A 任意:∂x∂xTAx=xT(A+AT)

完整推导

f=xTAx=∑i∑jxiAijxjf = \boldsymbol x^\mathrm T\boldsymbol A\boldsymbol x =\sum_{i}\sum_j x_i A_{ij}x_jf=xTAx=∑i∑jxiAijxj

对 xkx_kxk 求偏导:

∂f∂xk=∑jAkjxj+∑ixiAik=(Ax)k+(ATx)k\frac{\partial f}{\partial x_k} =\sum_j A_{kj}x_j + \sum_i x_i A_{ik} =(\boldsymbol A\boldsymbol x)_k + (\boldsymbol A^\mathrm T\boldsymbol x)_k∂xk∂f=∑jAkjxj+∑ixiAik=(Ax)k+(ATx)k

拼接为行向量:

∂f∂x=xTAT+xTA=xT(A+AT)\frac{\partial f}{\partial \boldsymbol x} = \boldsymbol x^\mathrm T\boldsymbol A^\mathrm T + \boldsymbol x^\mathrm T \boldsymbol A = \boldsymbol x^\mathrm T(\boldsymbol A+\boldsymbol A^\mathrm T)∂x∂f=xTAT+xTA=xT(A+AT)

若 AT=A\boldsymbol A^\mathrm T=\boldsymbol AAT=A ,则:

∂f∂x=2xTA\frac{\partial f}{\partial \boldsymbol x}=2\boldsymbol x^\mathrm T \boldsymbol A∂x∂f=2xTA


四、矩阵求导 核心公式+推导

1. 迹求导(矩阵求导万能工具)

性质:

  1. tr(A)=tr(AT)\mathrm{tr}(\boldsymbol A)=\mathrm{tr}(\boldsymbol A^\mathrm T)tr(A)=tr(AT)

  2. tr(AB)=tr(BA)\mathrm{tr}(\boldsymbol A\boldsymbol B)=\mathrm{tr}(\boldsymbol B\boldsymbol A)tr(AB)=tr(BA)

  3. d(tr(A))=tr(dA)\mathrm d(\mathrm{tr}(\boldsymbol A))=\mathrm{tr}(\mathrm d\boldsymbol A)d(tr(A))=tr(dA)


2. 标量对矩阵: ∂tr(ATX)∂X\dfrac{\partial \mathrm{tr}(\boldsymbol A^\mathrm T \boldsymbol X)}{\partial \boldsymbol X}∂X∂tr(ATX)

∂tr(ATX)∂X=A\boxed{\frac{\partial \mathrm{tr}(\boldsymbol A^\mathrm T \boldsymbol X)}{\partial \boldsymbol X} = \boldsymbol A}∂X∂tr(ATX)=A

推导

tr(ATX)=∑i,jAijXij⇒∂tr(ATX)∂Xij=Aij\mathrm{tr}(\boldsymbol A^\mathrm T\boldsymbol X) =\sum_{i,j}A_{ij}X_{ij} \Rightarrow \frac{\partial \mathrm{tr}(\boldsymbol A^\mathrm T\boldsymbol X)}{\partial X_{ij}}=A_{ij}tr(ATX)=∑i,jAijXij⇒∂Xij∂tr(ATX)=Aij


3. 二次矩阵型: XTAX\boldsymbol X^\mathrm T \boldsymbol A \boldsymbol XXTAX

∂tr(XTAX)∂X=(A+AT)X\frac{\partial \mathrm{tr}(\boldsymbol X^\mathrm T \boldsymbol A \boldsymbol X)}{\partial \boldsymbol X} = (\boldsymbol A+\boldsymbol A^\mathrm T)\boldsymbol X∂X∂tr(XTAX)=(A+AT)X


4. 矩阵逆求导

∂A−1∂x=−A−1∂A∂xA−1\frac{\partial \boldsymbol A^{-1}}{\partial x} = -\boldsymbol A^{-1}\frac{\partial \boldsymbol A}{\partial x}\boldsymbol A^{-1}∂x∂A−1=−A−1∂x∂AA−1

推导

由 AA−1=I\boldsymbol A\boldsymbol A^{-1}=\boldsymbol IAA−1=I ,两边微分:

dA⋅A−1+A⋅dA−1=0\mathrm d\boldsymbol A\cdot \boldsymbol A^{-1} +\boldsymbol A\cdot \mathrm d\boldsymbol A^{-1} = \boldsymbol 0dA⋅A−1+A⋅dA−1=0

移项:

dA−1=−A−1dA A−1\mathrm d\boldsymbol A^{-1} = -\boldsymbol A^{-1}\mathrm d\boldsymbol A\,\boldsymbol A^{-1}dA−1=−A−1dAA−1


五、必背 极简常用公式表

分子布局,列向量 x\boldsymbol xx

  1. ∂(Ax)∂x=A\displaystyle \frac{\partial (\boldsymbol A\boldsymbol x)}{\partial \boldsymbol x} = \boldsymbol A∂x∂(Ax)=A

  2. ∂(xTA)∂x=AT\displaystyle \frac{\partial (\boldsymbol x^\mathrm T\boldsymbol A)}{\partial \boldsymbol x} = \boldsymbol A^\mathrm T∂x∂(xTA)=AT

  3. ∂xTx∂x=2xT\displaystyle \frac{\partial \boldsymbol x^\mathrm T\boldsymbol x}{\partial \boldsymbol x} = 2\boldsymbol x^\mathrm T∂x∂xTx=2xT

  4. ∂xTAy∂x=yTAT\displaystyle \frac{\partial \boldsymbol x^\mathrm T\boldsymbol A\boldsymbol y}{\partial \boldsymbol x} = \boldsymbol y^\mathrm T\boldsymbol A^\mathrm T∂x∂xTAy=yTAT

  5. ∂tr(X)∂X=I\displaystyle \frac{\partial \mathrm{tr}(\boldsymbol X)}{\partial \boldsymbol X}=\boldsymbol I∂X∂tr(X)=I

  6. ∂∣X∣∂X=∣X∣⋅(X−1)T\displaystyle \frac{\partial |\boldsymbol X|}{\partial \boldsymbol X}= |\boldsymbol X|\cdot (\boldsymbol X^{-1})^\mathrm T∂X∂∣X∣=∣X∣⋅(X−1)T


六、举例实战(最小二乘)

损失:L=∣Ax−b∣2=(Ax−b)T(Ax−b)L=|\boldsymbol A\boldsymbol x-\boldsymbol b|^2=(\boldsymbol A\boldsymbol x-\boldsymbol b)^\mathrm T(\boldsymbol A\boldsymbol x-\boldsymbol b)L=∣Ax−b∣2=(Ax−b)T(Ax−b)

展开:

L=xTATAx−2bTAx+bTbL=\boldsymbol x^\mathrm T\boldsymbol A^\mathrm T\boldsymbol A\boldsymbol x -2\boldsymbol b^\mathrm T\boldsymbol A\boldsymbol x +\boldsymbol b^\mathrm T\boldsymbol bL=xTATAx−2bTAx+bTb

求导:

∂L∂x=2xTATA2bTA\frac{\partial L}{\partial \boldsymbol x}= 2\boldsymbol x^\mathrm T\boldsymbol A^\mathrm T\boldsymbol A2\boldsymbol b^\mathrm T\boldsymbol A∂x∂L=2xTATA2bTA

令导数为 0\boldsymbol 00 ,得正规方程:

ATAx=ATb\boldsymbol A^\mathrm T\boldsymbol A\boldsymbol x = \boldsymbol A^\mathrm T\boldsymbol bATAx=ATb

相关推荐
码以致用2 小时前
DeerFlow Memory架构
人工智能·ai·架构·agent
ting94520002 小时前
从零构建大模型实战:数据处理与 GPT-2 完整实现
人工智能
学点程序2 小时前
Manifest:帮个人 AI Agent 降低模型成本的开源路由器
人工智能·开源
可观测性用观测云2 小时前
观测云 x AI Agent:运维智能化的范式跃迁实践
人工智能
数数科技的数据干货2 小时前
ThinkingAI携手华为云,共建企业级AI Agent平台Agentic Engine
人工智能·ai·华为云·agent
人工智能AI技术2 小时前
春招急救:7天面试突击方案
人工智能
2603_954708312 小时前
如何确保微电网标准化架构设计流程的完整性?
网络·人工智能·物联网·架构·系统架构
小小AK2 小时前
钉钉与金蝶云星空无缝集成方案
大数据·人工智能·钉钉
不停喝水2 小时前
【AI+Cursor】 告别切图仔,拥抱Vibe Coding: AI + Cursor 开启多模态全栈新纪元 (1)
前端·人工智能·后端·ai·ai编程·cursor