深度学习之微积分

微积分

核心目标：理解导数、偏导、梯度与链式法则，并掌握 PyTorch 自动微分的实现机制。

1. 概述与学习路线

在深度学习里，微积分最核心的用途是优化：训练模型就是不断调整参数使损失函数变小------"往哪调、调多少"依赖导数/梯度；"怎么自动算"依赖 autograd。

概念	数学含义	深度学习对应
导数	单变量瞬时变化率	损失对单个参数的敏感度
偏导数	固定其他变量，看某一变量的影响	多参数损失对各参数的偏敏感
梯度	所有偏导组成的向量	`param.grad`，指向损失上升最快方向
链式法则	复合函数求导	反向传播算法的数学基础
自动微分	沿计算图机械应用链式法则	`requires_grad` + `backward()`

训练模型 ≈ 沿负梯度 方向更新参数：θ←θ−η∇L\theta \leftarrow \theta - \eta \nabla Lθ←θ−η∇L
#mermaid-svg-pAijno2mKAkmwrZL{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-pAijno2mKAkmwrZL .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-pAijno2mKAkmwrZL .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-pAijno2mKAkmwrZL .error-icon{fill:#552222;}#mermaid-svg-pAijno2mKAkmwrZL .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-pAijno2mKAkmwrZL .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-pAijno2mKAkmwrZL .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-pAijno2mKAkmwrZL .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-pAijno2mKAkmwrZL .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-pAijno2mKAkmwrZL .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-pAijno2mKAkmwrZL .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-pAijno2mKAkmwrZL .marker{fill:#333333;stroke:#333333;}#mermaid-svg-pAijno2mKAkmwrZL .marker.cross{stroke:#333333;}#mermaid-svg-pAijno2mKAkmwrZL svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-pAijno2mKAkmwrZL p{margin:0;}#mermaid-svg-pAijno2mKAkmwrZL .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-pAijno2mKAkmwrZL .cluster-label text{fill:#333;}#mermaid-svg-pAijno2mKAkmwrZL .cluster-label span{color:#333;}#mermaid-svg-pAijno2mKAkmwrZL .cluster-label span p{background-color:transparent;}#mermaid-svg-pAijno2mKAkmwrZL .label text,#mermaid-svg-pAijno2mKAkmwrZL span{fill:#333;color:#333;}#mermaid-svg-pAijno2mKAkmwrZL .node rect,#mermaid-svg-pAijno2mKAkmwrZL .node circle,#mermaid-svg-pAijno2mKAkmwrZL .node ellipse,#mermaid-svg-pAijno2mKAkmwrZL .node polygon,#mermaid-svg-pAijno2mKAkmwrZL .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-pAijno2mKAkmwrZL .rough-node .label text,#mermaid-svg-pAijno2mKAkmwrZL .node .label text,#mermaid-svg-pAijno2mKAkmwrZL .image-shape .label,#mermaid-svg-pAijno2mKAkmwrZL .icon-shape .label{text-anchor:middle;}#mermaid-svg-pAijno2mKAkmwrZL .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-pAijno2mKAkmwrZL .rough-node .label,#mermaid-svg-pAijno2mKAkmwrZL .node .label,#mermaid-svg-pAijno2mKAkmwrZL .image-shape .label,#mermaid-svg-pAijno2mKAkmwrZL .icon-shape .label{text-align:center;}#mermaid-svg-pAijno2mKAkmwrZL .node.clickable{cursor:pointer;}#mermaid-svg-pAijno2mKAkmwrZL .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-pAijno2mKAkmwrZL .arrowheadPath{fill:#333333;}#mermaid-svg-pAijno2mKAkmwrZL .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-pAijno2mKAkmwrZL .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-pAijno2mKAkmwrZL .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-pAijno2mKAkmwrZL .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-pAijno2mKAkmwrZL .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-pAijno2mKAkmwrZL .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-pAijno2mKAkmwrZL .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-pAijno2mKAkmwrZL .cluster text{fill:#333;}#mermaid-svg-pAijno2mKAkmwrZL .cluster span{color:#333;}#mermaid-svg-pAijno2mKAkmwrZL div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-pAijno2mKAkmwrZL .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-pAijno2mKAkmwrZL rect.text{fill:none;stroke-width:0;}#mermaid-svg-pAijno2mKAkmwrZL .icon-shape,#mermaid-svg-pAijno2mKAkmwrZL .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-pAijno2mKAkmwrZL .icon-shape p,#mermaid-svg-pAijno2mKAkmwrZL .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-pAijno2mKAkmwrZL .icon-shape .label rect,#mermaid-svg-pAijno2mKAkmwrZL .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-pAijno2mKAkmwrZL .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-pAijno2mKAkmwrZL .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-pAijno2mKAkmwrZL :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 2 导数
3 偏导数与梯度
4 链式法则
5 自动微分
参数优化
损失下降

2. 导数与微分

2.1 定义

导数表示函数在某点的瞬时变化率 ，也可理解为该点切线斜率。

给定 f:R→Rf: \mathbb{R} \rightarrow \mathbb{R}f:R→R：

f′(x)=lim⁡h→0f(x+h)−f(x)h f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} f′(x)=h→0limhf(x+h)−f(x)

常见等价记号：f′(x)=dydx=dfdxf'(x) = \dfrac{dy}{dx} = \dfrac{df}{dx}f′(x)=dxdy=dxdf

微分：dy=f′(x)⋅dxdy = f'(x) \cdot dxdy=f′(x)⋅dx

2.2 常见求导法则

法则	公式
常数	ddxC=0\dfrac{d}{dx}C = 0dxdC=0
幂函数	ddxxn=nxn−1\dfrac{d}{dx}x^n = nx^{n-1}dxdxn=nxn−1
加法	(f+g)′=f′+g′(f+g)' = f' + g'(f+g)′=f′+g′
乘法	(fg)′=f′g+fg′(fg)' = f'g + fg'(fg)′=f′g+fg′
除法	(fg)′=f′g−fg′g2\left(\dfrac{f}{g}\right)' = \dfrac{f'g - fg'}{g^2}(gf)′=g2f′g−fg′

2.3 验证示例

python 复制代码

import torch

x = torch.tensor(3.0, requires_grad=True)
y = x ** 2          # y = x²，dy/dx = 2x

y.backward()
print(x.grad)       # tensor(6.)，x=3 → 6

3. 偏导数与梯度

3.1 偏导数

损失函数通常依赖很多参数。若 y=f(x1,...,xn)y = f(x_1, \dots, x_n)y=f(x1,...,xn)，对第 iii 个变量的偏导：

∂y∂xi=lim⁡h→0f(...,xi+h,... )−f(...,xi,... )h \frac{\partial y}{\partial x_i} = \lim_{h \to 0} \frac{f(\dots, x_i+h, \dots) - f(\dots, x_i, \dots)}{h} ∂xi∂y=h→0limhf(...,xi+h,...)−f(...,xi,...)

直觉：把其余变量都当常数，只看 xix_ixi 的微小变化如何影响 yyy。

python 复制代码

import torch

x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
z = x ** 2 + y ** 3   # z = x² + y³

z.backward()
print(x.grad)   # 4.，∂z/∂x = 2x
print(y.grad)   # 27.，∂z/∂y = 3y²

3.2 梯度

对于标量输出函数 f:Rn→Rf: \mathbb{R}^n \rightarrow \mathbb{R}f:Rn→R，梯度是所有偏导组成的列向量：

∇f(x)= $\partialf\partialx1,\partialf\partialx2,...,\partialf\partialxn$ ⊤ \nabla f(\mathbf{x}) = \left $\\frac{\\partial f}{\\partial x_1}, \\frac{\\partial f}{\\partial x_2}, \\dots, \\frac{\\partial f}{\\partial x_n} \\right$ ^\top ∇f(x)= $\partialx1\partialf,\partialx2\partialf,...,\partialxn\partialf$ ⊤

梯度用 ∇\nabla∇（Nabla）表示
梯度指向损失上升最快 的方向；参数更新沿负梯度方向

python 复制代码

import torch

w1 = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
w2 = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
b  = torch.tensor(0.5, requires_grad=True)

L = (w1 ** 2).sum() + w2.sum() + b ** 3
L.backward()

print(w1.grad)   # [2., 4., 6.]，∂L/∂w1 = 2w1
print(w2.grad)   # 全 1，∂L/∂w2 = 1
print(b.grad)    # 0.75，∂L/∂b = 3b²

4. 链式法则

若 u=g(x)u = g(x)u=g(x)，y=f(u)y = f(u)y=f(u)，则：

dydx=dydu⋅dudx \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} dxdy=dudy⋅dxdu

深度神经网络是多层复合函数 y=fn(⋯f2(f1(x)))y = f_n(\cdots f_2(f_1(x)))y=fn(⋯f2(f1(x)))，反向传播就是逐层套用链式法则，把输出层误差传回各层参数。

python 复制代码

import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
z = y ** 2          # z = x⁴，dz/dx = 4x³

z.backward()
print(x.grad)       # tensor(32.)，x=2 → 4×8 = 32

5. 自动微分

PyTorch 通过 autograd 引擎（动态计算图 ），搭配 requires_grad 与 backward() 自动完成上述求导，无需手写公式。

5.1 计算图与反向传播

计算图把计算过程表示成有向图：

要素	含义
节点	张量（输入、参数、中间结果、输出）
边	运算依赖（谁是谁的输入）
前向	输入 → 输出，一路算到损失
反向	输出 → 输入，沿图应用链式法则求梯度

示例：y=2⋅x⊤xy = 2 \cdot \mathbf{x}^\top \mathbf{x}y=2⋅x⊤x
#mermaid-svg-PuszAYeJe6ys7hc2{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-PuszAYeJe6ys7hc2 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-PuszAYeJe6ys7hc2 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-PuszAYeJe6ys7hc2 .error-icon{fill:#552222;}#mermaid-svg-PuszAYeJe6ys7hc2 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-PuszAYeJe6ys7hc2 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-PuszAYeJe6ys7hc2 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-PuszAYeJe6ys7hc2 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-PuszAYeJe6ys7hc2 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-PuszAYeJe6ys7hc2 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-PuszAYeJe6ys7hc2 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-PuszAYeJe6ys7hc2 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-PuszAYeJe6ys7hc2 .marker.cross{stroke:#333333;}#mermaid-svg-PuszAYeJe6ys7hc2 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-PuszAYeJe6ys7hc2 p{margin:0;}#mermaid-svg-PuszAYeJe6ys7hc2 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-PuszAYeJe6ys7hc2 .cluster-label text{fill:#333;}#mermaid-svg-PuszAYeJe6ys7hc2 .cluster-label span{color:#333;}#mermaid-svg-PuszAYeJe6ys7hc2 .cluster-label span p{background-color:transparent;}#mermaid-svg-PuszAYeJe6ys7hc2 .label text,#mermaid-svg-PuszAYeJe6ys7hc2 span{fill:#333;color:#333;}#mermaid-svg-PuszAYeJe6ys7hc2 .node rect,#mermaid-svg-PuszAYeJe6ys7hc2 .node circle,#mermaid-svg-PuszAYeJe6ys7hc2 .node ellipse,#mermaid-svg-PuszAYeJe6ys7hc2 .node polygon,#mermaid-svg-PuszAYeJe6ys7hc2 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-PuszAYeJe6ys7hc2 .rough-node .label text,#mermaid-svg-PuszAYeJe6ys7hc2 .node .label text,#mermaid-svg-PuszAYeJe6ys7hc2 .image-shape .label,#mermaid-svg-PuszAYeJe6ys7hc2 .icon-shape .label{text-anchor:middle;}#mermaid-svg-PuszAYeJe6ys7hc2 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-PuszAYeJe6ys7hc2 .rough-node .label,#mermaid-svg-PuszAYeJe6ys7hc2 .node .label,#mermaid-svg-PuszAYeJe6ys7hc2 .image-shape .label,#mermaid-svg-PuszAYeJe6ys7hc2 .icon-shape .label{text-align:center;}#mermaid-svg-PuszAYeJe6ys7hc2 .node.clickable{cursor:pointer;}#mermaid-svg-PuszAYeJe6ys7hc2 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-PuszAYeJe6ys7hc2 .arrowheadPath{fill:#333333;}#mermaid-svg-PuszAYeJe6ys7hc2 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-PuszAYeJe6ys7hc2 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-PuszAYeJe6ys7hc2 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-PuszAYeJe6ys7hc2 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-PuszAYeJe6ys7hc2 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-PuszAYeJe6ys7hc2 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-PuszAYeJe6ys7hc2 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-PuszAYeJe6ys7hc2 .cluster text{fill:#333;}#mermaid-svg-PuszAYeJe6ys7hc2 .cluster span{color:#333;}#mermaid-svg-PuszAYeJe6ys7hc2 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-PuszAYeJe6ys7hc2 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-PuszAYeJe6ys7hc2 rect.text{fill:none;stroke-width:0;}#mermaid-svg-PuszAYeJe6ys7hc2 .icon-shape,#mermaid-svg-PuszAYeJe6ys7hc2 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-PuszAYeJe6ys7hc2 .icon-shape p,#mermaid-svg-PuszAYeJe6ys7hc2 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-PuszAYeJe6ys7hc2 .icon-shape .label rect,#mermaid-svg-PuszAYeJe6ys7hc2 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-PuszAYeJe6ys7hc2 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-PuszAYeJe6ys7hc2 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-PuszAYeJe6ys7hc2 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} x 叶子张量
dot(x, x)
常量 2
乘法
y 标量输出

5.2 四步流程与 grad_fn

python 复制代码

import torch

# 1.开启追踪
x = torch.arange(4.0, requires_grad=True)

# 2.可微运算（自动构图）
y = 2 * torch.dot(x, x)
print(y.requires_grad)   # True
print(y.grad_fn)         # 非空

# 3.反向传播
y.backward()

# 4.读取梯度
print(x.grad)            # [0., 4., 8., 12.]，即 4x

步骤	API	作用
1	`requires_grad=True`	标记需要梯度的张量
2	可微运算	自动记录计算图
3	`backward()`	沿图反向求导
4	`.grad`	读取梯度结果

grad_fn 记录张量由什么运算产生，指示反向传播用哪种求导规则：

张量类型	`grad_fn`
叶子张量（手动创建且 `requires_grad=True`）	`None`
运算产生的中间张量	非空（如 `PowBackward0`）

python 复制代码

import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
z = y.sum()

print(x.grad_fn)   # None
print(y.grad_fn)   # PowBackward0
print(z.grad_fn)   # SumBackward0

5.3 梯度累积与清空

PyTorch 默认累加梯度（grad += 新梯度），再次 backward() 前须清空：

python 复制代码

import torch

x = torch.tensor(2.0, requires_grad=True)

(x ** 2).backward()     # x.grad = 4.
x.grad.zero_()          # 清空；训练中用 optimizer.zero_grad()

(x ** 3).backward()     # x.grad = 12.
# 若不清空：4. + 12. = 16.

5.4 非标量输出

backward() 要求输出为标量；向量/矩阵输出需先聚合：

python 复制代码

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
(x ** 2).sum().backward()   # 先 sum 成标量
print(x.grad)               # [2., 4., 6.]

5.5 分离计算图（detach）

某段计算不参与梯度回传 时，用 detach() 切断计算图：

python 复制代码

import torch

x = torch.arange(4.0, requires_grad=True)
u = (x * x).detach()        # 数值保留，梯度追踪切断

(u * x).sum().backward()
print(x.grad)               # [0., 1., 4., 9.]，u 视为常数

5.6 控制流与动态图

循环、分支中的运算同样可自动求导（动态图按实际执行路径构图）：

python 复制代码

import torch

def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    return b if b.sum() > 0 else 100 * b

a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()
# d = k * a → a.grad = d / a

6. 易错点速查

易错点	正确理解
混淆导数与梯度	导数：单变量；梯度：多元偏导的向量
梯度方向搞反	梯度指向损失上升方向；更新用负梯度
对非标量 `backward()`	须先 `.sum()` 等聚合成标量
多次反传不清梯度	默认累积；每步训练前 `zero_grad()`
叶子张量 `grad_fn` 为 None	正常；非空说明是中间结果
`detach` 后仍期望回传	该分支被视为常数，梯度不会流过

7. 小结

微积分在深度学习里就是：用导数和梯度告诉参数该往哪里调，靠链式法则把误差从输出层传回参数层；autograd 沿计算图自动完成这一切。