
摘要
扩散去噪不是"逐步擦除噪声"的直觉过程,而是求解随机微分方程(SDE)的逆向过程------前向过程按噪声调度表 αˉt\bar\alpha_tαˉt 逐步加噪,逆向过程用训练好的 ϵθ\epsilon_\thetaϵθ 估计每步噪声再从 xtx_txt 中扣除,数学本质是Langevin动力学与分数匹配的统一。本文从前向加噪的马尔可夫链、逆向SDE的推导、噪声预测与分数函数的等价性、信噪比与噪声调度的关系、去噪一步的几何意义、VE/VP/VPSDE三种扩散框架,到ComfyUI中的去噪循环实现,逐层拆解扩散去噪的底层原理。
1. 前向过程:马尔可夫链与噪声调度
前向过程定义了如何将干净图像 x0x_0x0 逐步加噪到纯噪声 xTx_TxT。每一步是一个条件高斯转移:
q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)q(xt∣xt−1)=N(xt;1−βt xt−1,βtI)
通过重参数化可一步到位:xt=αˉtx0+1−αˉtϵx_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1-\bar\alpha_t} \epsilonxt=αˉt x0+1−αˉt ϵ,其中 αˉt=∏s=1t(1−βs)\bar\alpha_t = \prod_{s=1}^{t}(1-\beta_s)αˉt=∏s=1t(1−βs)。
#mermaid-svg-8Eg8lNubpkkGMKtQ{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-8Eg8lNubpkkGMKtQ .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-8Eg8lNubpkkGMKtQ .error-icon{fill:#552222;}#mermaid-svg-8Eg8lNubpkkGMKtQ .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-8Eg8lNubpkkGMKtQ .marker{fill:#333333;stroke:#333333;}#mermaid-svg-8Eg8lNubpkkGMKtQ .marker.cross{stroke:#333333;}#mermaid-svg-8Eg8lNubpkkGMKtQ svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-8Eg8lNubpkkGMKtQ p{margin:0;}#mermaid-svg-8Eg8lNubpkkGMKtQ .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-8Eg8lNubpkkGMKtQ .cluster-label text{fill:#333;}#mermaid-svg-8Eg8lNubpkkGMKtQ .cluster-label span{color:#333;}#mermaid-svg-8Eg8lNubpkkGMKtQ .cluster-label span p{background-color:transparent;}#mermaid-svg-8Eg8lNubpkkGMKtQ .label text,#mermaid-svg-8Eg8lNubpkkGMKtQ span{fill:#333;color:#333;}#mermaid-svg-8Eg8lNubpkkGMKtQ .node rect,#mermaid-svg-8Eg8lNubpkkGMKtQ .node circle,#mermaid-svg-8Eg8lNubpkkGMKtQ .node ellipse,#mermaid-svg-8Eg8lNubpkkGMKtQ .node polygon,#mermaid-svg-8Eg8lNubpkkGMKtQ .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-8Eg8lNubpkkGMKtQ .rough-node .label text,#mermaid-svg-8Eg8lNubpkkGMKtQ .node .label text,#mermaid-svg-8Eg8lNubpkkGMKtQ .image-shape .label,#mermaid-svg-8Eg8lNubpkkGMKtQ .icon-shape .label{text-anchor:middle;}#mermaid-svg-8Eg8lNubpkkGMKtQ .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-8Eg8lNubpkkGMKtQ .rough-node .label,#mermaid-svg-8Eg8lNubpkkGMKtQ .node .label,#mermaid-svg-8Eg8lNubpkkGMKtQ .image-shape .label,#mermaid-svg-8Eg8lNubpkkGMKtQ .icon-shape .label{text-align:center;}#mermaid-svg-8Eg8lNubpkkGMKtQ .node.clickable{cursor:pointer;}#mermaid-svg-8Eg8lNubpkkGMKtQ .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-8Eg8lNubpkkGMKtQ .arrowheadPath{fill:#333333;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-8Eg8lNubpkkGMKtQ .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-8Eg8lNubpkkGMKtQ .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-8Eg8lNubpkkGMKtQ .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-8Eg8lNubpkkGMKtQ .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-8Eg8lNubpkkGMKtQ .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-8Eg8lNubpkkGMKtQ .cluster text{fill:#333;}#mermaid-svg-8Eg8lNubpkkGMKtQ .cluster span{color:#333;}#mermaid-svg-8Eg8lNubpkkGMKtQ div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-8Eg8lNubpkkGMKtQ .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-8Eg8lNubpkkGMKtQ rect.text{fill:none;stroke-width:0;}#mermaid-svg-8Eg8lNubpkkGMKtQ .icon-shape,#mermaid-svg-8Eg8lNubpkkGMKtQ .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-8Eg8lNubpkkGMKtQ .icon-shape p,#mermaid-svg-8Eg8lNubpkkGMKtQ .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-8Eg8lNubpkkGMKtQ .icon-shape .label rect,#mermaid-svg-8Eg8lNubpkkGMKtQ .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-8Eg8lNubpkkGMKtQ .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-8Eg8lNubpkkGMKtQ .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-8Eg8lNubpkkGMKtQ :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-8Eg8lNubpkkGMKtQ .default>*{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-8Eg8lNubpkkGMKtQ .default span{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-8Eg8lNubpkkGMKtQ .default tspan{fill:#000000!important;} x_0 干净图像
x_1 + noise
x_2 + noise
...
x_T 纯噪声
任意步直达
x_t = sqrt alpha_bar_t * x_0 + sqrt 1 - alpha_bar_t * epsilon
// 来源:Ho et al. "Denoising Diffusion Probabilistic Models" (2020) Section 2
python
# 前向过程: 逐步加噪与一步直达
import torch
def forward_process(x_0, beta_schedule, t):
"""一步从x_0到x_t"""
# beta_schedule: [T] 每步的噪声方差
alpha = 1.0 - beta_schedule # alpha_t = 1 - beta_t
alpha_bar = torch.cumprod(alpha, dim=0) # alpha_bar_t = prod(1-beta_s)
alpha_bar_t = alpha_bar[t]
# 一步直达: x_t = sqrt(alpha_bar_t) * x_0 + sqrt(1 - alpha_bar_t) * eps
eps = torch.randn_like(x_0)
x_t = torch.sqrt(alpha_bar_t) * x_0 + torch.sqrt(1 - alpha_bar_t) * eps
return x_t, eps # eps是用于训练的目标
# SD1.5的噪声调度 (linear, T=1000)
# beta_1 = 0.00085, beta_1000 = 0.012
# alpha_bar_1 ≈ 0.999, alpha_bar_1000 ≈ 0.007
# t=1时几乎无噪声, t=1000时几乎纯噪声
2. 逆向过程的SDE推导
逆向去噪是前向过程的概率逆,可以用SDE严格推导。前向SDE为:
dx=f(x,t)dt+g(t)dwdx = f(x,t)dt + g(t)d\mathbf{w}dx=f(x,t)dt+g(t)dw
Anderson(1982)定理给出逆向SDE:
dx=f(x,t)−g(t)2∇xlogpt(x)dt+g(t)dwˉdx = f(x,t) - g(t)\^2 \\nabla_x \\log p_t(x)dt + g(t)d\bar{\mathbf{w}}dx=f(x,t)−g(t)2∇xlogpt(x)dt+g(t)dwˉ
#mermaid-svg-JgKOoNEXXhbhAekM{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-JgKOoNEXXhbhAekM .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-JgKOoNEXXhbhAekM .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-JgKOoNEXXhbhAekM .error-icon{fill:#552222;}#mermaid-svg-JgKOoNEXXhbhAekM .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-JgKOoNEXXhbhAekM .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-JgKOoNEXXhbhAekM .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-JgKOoNEXXhbhAekM .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-JgKOoNEXXhbhAekM .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-JgKOoNEXXhbhAekM .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-JgKOoNEXXhbhAekM .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-JgKOoNEXXhbhAekM .marker{fill:#333333;stroke:#333333;}#mermaid-svg-JgKOoNEXXhbhAekM .marker.cross{stroke:#333333;}#mermaid-svg-JgKOoNEXXhbhAekM svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-JgKOoNEXXhbhAekM p{margin:0;}#mermaid-svg-JgKOoNEXXhbhAekM .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-JgKOoNEXXhbhAekM .cluster-label text{fill:#333;}#mermaid-svg-JgKOoNEXXhbhAekM .cluster-label span{color:#333;}#mermaid-svg-JgKOoNEXXhbhAekM .cluster-label span p{background-color:transparent;}#mermaid-svg-JgKOoNEXXhbhAekM .label text,#mermaid-svg-JgKOoNEXXhbhAekM span{fill:#333;color:#333;}#mermaid-svg-JgKOoNEXXhbhAekM .node rect,#mermaid-svg-JgKOoNEXXhbhAekM .node circle,#mermaid-svg-JgKOoNEXXhbhAekM .node ellipse,#mermaid-svg-JgKOoNEXXhbhAekM .node polygon,#mermaid-svg-JgKOoNEXXhbhAekM .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-JgKOoNEXXhbhAekM .rough-node .label text,#mermaid-svg-JgKOoNEXXhbhAekM .node .label text,#mermaid-svg-JgKOoNEXXhbhAekM .image-shape .label,#mermaid-svg-JgKOoNEXXhbhAekM .icon-shape .label{text-anchor:middle;}#mermaid-svg-JgKOoNEXXhbhAekM .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-JgKOoNEXXhbhAekM .rough-node .label,#mermaid-svg-JgKOoNEXXhbhAekM .node .label,#mermaid-svg-JgKOoNEXXhbhAekM .image-shape .label,#mermaid-svg-JgKOoNEXXhbhAekM .icon-shape .label{text-align:center;}#mermaid-svg-JgKOoNEXXhbhAekM .node.clickable{cursor:pointer;}#mermaid-svg-JgKOoNEXXhbhAekM .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-JgKOoNEXXhbhAekM .arrowheadPath{fill:#333333;}#mermaid-svg-JgKOoNEXXhbhAekM .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-JgKOoNEXXhbhAekM .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-JgKOoNEXXhbhAekM .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-JgKOoNEXXhbhAekM .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-JgKOoNEXXhbhAekM .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-JgKOoNEXXhbhAekM .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-JgKOoNEXXhbhAekM .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-JgKOoNEXXhbhAekM .cluster text{fill:#333;}#mermaid-svg-JgKOoNEXXhbhAekM .cluster span{color:#333;}#mermaid-svg-JgKOoNEXXhbhAekM div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-JgKOoNEXXhbhAekM .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-JgKOoNEXXhbhAekM rect.text{fill:none;stroke-width:0;}#mermaid-svg-JgKOoNEXXhbhAekM .icon-shape,#mermaid-svg-JgKOoNEXXhbhAekM .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-JgKOoNEXXhbhAekM .icon-shape p,#mermaid-svg-JgKOoNEXXhbhAekM .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-JgKOoNEXXhbhAekM .icon-shape .label rect,#mermaid-svg-JgKOoNEXXhbhAekM .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-JgKOoNEXXhbhAekM .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-JgKOoNEXXhbhAekM .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-JgKOoNEXXhbhAekM :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-JgKOoNEXXhbhAekM .default>*{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-JgKOoNEXXhbhAekM .default span{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-JgKOoNEXXhbhAekM .default tspan{fill:#000000!important;} 前向SDE
dx = f x t dt + g t dw
逆向SDE Anderson 1982
dx = f - g^2 grad log p_t dt + g d w_bar
核心: 需要分数函数 grad_x log p_t x
分数函数不可直接计算
但等价于去噪!
score = -epsilon / sqrt 1 - alpha_bar_t
训练epsilon_theta估计噪声 = 估计分数函数
// 来源:Song et al. "Score-Based Generative Modeling through SDEs" (2021)
python
# 分数函数与噪声预测的等价性
# x_t = sqrt(alpha_bar) * x_0 + sqrt(1 - alpha_bar) * eps
# p(x_t)的分数函数:
# score(x_t, t) = grad_{x_t} log p(x_t)
# = -eps / sqrt(1 - alpha_bar_t)
# (高斯分布的分数函数有解析形式)
#
# 因此: 训练 epsilon_theta(x_t, t) ≈ eps
# 等价于训练 score_theta(x_t, t) ≈ score(x_t, t)
#
# 去噪 = 逆向SDE求解 = 用分数函数引导随机游走回到数据分布
def score_from_epsilon(eps_pred, alpha_bar_t):
"""从噪声预测转换为分数函数"""
return -eps_pred / torch.sqrt(1 - alpha_bar_t)
3. DDPM去噪:离散马尔可夫转移
DDPM将逆向SDE离散化为有限步马尔可夫转移,每一步从 xtx_txt 估计 xt−1x_{t-1}xt−1:
xt−1=1αt(xt−1−αt1−αˉtϵθ(xt,t))+σtzx_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{1-\alpha_t}{\sqrt{1-\bar\alpha_t}} \epsilon_\theta(x_t, t) \right) + \sigma_t zxt−1=αt 1(xt−1−αˉt 1−αtϵθ(xt,t))+σtz
#mermaid-svg-k8dcoiHlvELJRefi{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-k8dcoiHlvELJRefi .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-k8dcoiHlvELJRefi .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-k8dcoiHlvELJRefi .error-icon{fill:#552222;}#mermaid-svg-k8dcoiHlvELJRefi .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-k8dcoiHlvELJRefi .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-k8dcoiHlvELJRefi .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-k8dcoiHlvELJRefi .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-k8dcoiHlvELJRefi .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-k8dcoiHlvELJRefi .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-k8dcoiHlvELJRefi .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-k8dcoiHlvELJRefi .marker{fill:#333333;stroke:#333333;}#mermaid-svg-k8dcoiHlvELJRefi .marker.cross{stroke:#333333;}#mermaid-svg-k8dcoiHlvELJRefi svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-k8dcoiHlvELJRefi p{margin:0;}#mermaid-svg-k8dcoiHlvELJRefi .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-k8dcoiHlvELJRefi .cluster-label text{fill:#333;}#mermaid-svg-k8dcoiHlvELJRefi .cluster-label span{color:#333;}#mermaid-svg-k8dcoiHlvELJRefi .cluster-label span p{background-color:transparent;}#mermaid-svg-k8dcoiHlvELJRefi .label text,#mermaid-svg-k8dcoiHlvELJRefi span{fill:#333;color:#333;}#mermaid-svg-k8dcoiHlvELJRefi .node rect,#mermaid-svg-k8dcoiHlvELJRefi .node circle,#mermaid-svg-k8dcoiHlvELJRefi .node ellipse,#mermaid-svg-k8dcoiHlvELJRefi .node polygon,#mermaid-svg-k8dcoiHlvELJRefi .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-k8dcoiHlvELJRefi .rough-node .label text,#mermaid-svg-k8dcoiHlvELJRefi .node .label text,#mermaid-svg-k8dcoiHlvELJRefi .image-shape .label,#mermaid-svg-k8dcoiHlvELJRefi .icon-shape .label{text-anchor:middle;}#mermaid-svg-k8dcoiHlvELJRefi .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-k8dcoiHlvELJRefi .rough-node .label,#mermaid-svg-k8dcoiHlvELJRefi .node .label,#mermaid-svg-k8dcoiHlvELJRefi .image-shape .label,#mermaid-svg-k8dcoiHlvELJRefi .icon-shape .label{text-align:center;}#mermaid-svg-k8dcoiHlvELJRefi .node.clickable{cursor:pointer;}#mermaid-svg-k8dcoiHlvELJRefi .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-k8dcoiHlvELJRefi .arrowheadPath{fill:#333333;}#mermaid-svg-k8dcoiHlvELJRefi .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-k8dcoiHlvELJRefi .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-k8dcoiHlvELJRefi .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-k8dcoiHlvELJRefi .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-k8dcoiHlvELJRefi .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-k8dcoiHlvELJRefi .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-k8dcoiHlvELJRefi .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-k8dcoiHlvELJRefi .cluster text{fill:#333;}#mermaid-svg-k8dcoiHlvELJRefi .cluster span{color:#333;}#mermaid-svg-k8dcoiHlvELJRefi div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-k8dcoiHlvELJRefi .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-k8dcoiHlvELJRefi rect.text{fill:none;stroke-width:0;}#mermaid-svg-k8dcoiHlvELJRefi .icon-shape,#mermaid-svg-k8dcoiHlvELJRefi .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-k8dcoiHlvELJRefi .icon-shape p,#mermaid-svg-k8dcoiHlvELJRefi .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-k8dcoiHlvELJRefi .icon-shape .label rect,#mermaid-svg-k8dcoiHlvELJRefi .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-k8dcoiHlvELJRefi .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-k8dcoiHlvELJRefi .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-k8dcoiHlvELJRefi :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-k8dcoiHlvELJRefi .default>*{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-k8dcoiHlvELJRefi .default span{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-k8dcoiHlvELJRefi .default tspan{fill:#000000!important;} x_t
eps_theta x_t t 预测噪声
x_t - 1-alpha / sqrt 1-alpha_bar * eps_pred
/ sqrt alpha_t
- sigma_t * z 随机性注入
x_t-1
sigma_t
方差选择
小: sigma = beta 后验方差
大: sigma = beta_tilde 最大方差
// 来源:Ho et al. (2020) Algorithm 2
python
# DDPM单步去噪
def ddpm_step(x_t, t, eps_pred, beta_t, alpha_t, alpha_bar_t):
"""DDPM去噪一步: x_t -> x_{t-1}"""
# 确定性部分: 从x_t中扣除预测噪声
coef1 = 1.0 / torch.sqrt(alpha_t)
coef2 = (1 - alpha_t) / torch.sqrt(1 - alpha_bar_t)
x_pred = coef1 * (x_t - coef2 * eps_pred)
# 随机性部分: 注入小量噪声
sigma_t = torch.sqrt(beta_t) # 后验标准差
z = torch.randn_like(x_t) if t > 0 else 0 # 最后一步不加噪声
return x_pred + sigma_t * z
# 关键: DDPM每步注入随机噪声
# 这是随机采样器, 不是确定性的
# 随机性来自Langevin动力学
4. DDIM去噪:确定性逆向的数学本质
DDIM将逆向过程中的随机项设为零,得到确定性映射。等价于求解逆向常微分方程(ODE)。
xt−1=αˉt−1xt−1−αˉtϵθαˉt⏟预测的x0+1−αˉt−1ϵθx_{t-1} = \sqrt{\bar\alpha_{t-1}} \underbrace{\frac{x_t - \sqrt{1-\bar\alpha_t}\epsilon_\theta}{\sqrt{\bar\alpha_t}}}{\text{预测的}x_0} + \sqrt{1-\bar\alpha{t-1}} \epsilon_\thetaxt−1=αˉt−1 预测的x0 αˉt xt−1−αˉt ϵθ+1−αˉt−1 ϵθ
#mermaid-svg-mWN5MTPbRIGgGGr7{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-mWN5MTPbRIGgGGr7 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-mWN5MTPbRIGgGGr7 .error-icon{fill:#552222;}#mermaid-svg-mWN5MTPbRIGgGGr7 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-mWN5MTPbRIGgGGr7 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-mWN5MTPbRIGgGGr7 .marker.cross{stroke:#333333;}#mermaid-svg-mWN5MTPbRIGgGGr7 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-mWN5MTPbRIGgGGr7 p{margin:0;}#mermaid-svg-mWN5MTPbRIGgGGr7 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-mWN5MTPbRIGgGGr7 .cluster-label text{fill:#333;}#mermaid-svg-mWN5MTPbRIGgGGr7 .cluster-label span{color:#333;}#mermaid-svg-mWN5MTPbRIGgGGr7 .cluster-label span p{background-color:transparent;}#mermaid-svg-mWN5MTPbRIGgGGr7 .label text,#mermaid-svg-mWN5MTPbRIGgGGr7 span{fill:#333;color:#333;}#mermaid-svg-mWN5MTPbRIGgGGr7 .node rect,#mermaid-svg-mWN5MTPbRIGgGGr7 .node circle,#mermaid-svg-mWN5MTPbRIGgGGr7 .node ellipse,#mermaid-svg-mWN5MTPbRIGgGGr7 .node polygon,#mermaid-svg-mWN5MTPbRIGgGGr7 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-mWN5MTPbRIGgGGr7 .rough-node .label text,#mermaid-svg-mWN5MTPbRIGgGGr7 .node .label text,#mermaid-svg-mWN5MTPbRIGgGGr7 .image-shape .label,#mermaid-svg-mWN5MTPbRIGgGGr7 .icon-shape .label{text-anchor:middle;}#mermaid-svg-mWN5MTPbRIGgGGr7 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-mWN5MTPbRIGgGGr7 .rough-node .label,#mermaid-svg-mWN5MTPbRIGgGGr7 .node .label,#mermaid-svg-mWN5MTPbRIGgGGr7 .image-shape .label,#mermaid-svg-mWN5MTPbRIGgGGr7 .icon-shape .label{text-align:center;}#mermaid-svg-mWN5MTPbRIGgGGr7 .node.clickable{cursor:pointer;}#mermaid-svg-mWN5MTPbRIGgGGr7 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-mWN5MTPbRIGgGGr7 .arrowheadPath{fill:#333333;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-mWN5MTPbRIGgGGr7 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-mWN5MTPbRIGgGGr7 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-mWN5MTPbRIGgGGr7 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-mWN5MTPbRIGgGGr7 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-mWN5MTPbRIGgGGr7 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-mWN5MTPbRIGgGGr7 .cluster text{fill:#333;}#mermaid-svg-mWN5MTPbRIGgGGr7 .cluster span{color:#333;}#mermaid-svg-mWN5MTPbRIGgGGr7 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-mWN5MTPbRIGgGGr7 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-mWN5MTPbRIGgGGr7 rect.text{fill:none;stroke-width:0;}#mermaid-svg-mWN5MTPbRIGgGGr7 .icon-shape,#mermaid-svg-mWN5MTPbRIGgGGr7 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-mWN5MTPbRIGgGGr7 .icon-shape p,#mermaid-svg-mWN5MTPbRIGgGGr7 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-mWN5MTPbRIGgGGr7 .icon-shape .label rect,#mermaid-svg-mWN5MTPbRIGgGGr7 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-mWN5MTPbRIGgGGr7 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-mWN5MTPbRIGgGGr7 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-mWN5MTPbRIGgGGr7 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-mWN5MTPbRIGgGGr7 .default>*{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-mWN5MTPbRIGgGGr7 .default span{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-mWN5MTPbRIGgGGr7 .default tspan{fill:#000000!important;} DDIM
逆向ODE求解
确定性: 相同输入永远相同输出
eta参数控制随机性
eta=0: 纯ODE 确定性
eta=1: 等价DDPM 随机性
子步采样
1000步可跳至50步
跳步不破坏一致性
ODE轨迹不依赖步数
步数只影响精度
// 来源:Song et al. "Denoising Diffusion Implicit Models" (2021)
python
# DDIM单步去噪
def ddim_step(x_t, t, t_prev, eps_pred, alpha_bar_t, alpha_bar_prev, eta=0.0):
"""DDIM去噪一步: x_t -> x_{t_prev}"""
# 预测x_0 (从x_t和预测噪声反推)
x0_pred = (x_t - torch.sqrt(1 - alpha_bar_t) * eps_pred) / torch.sqrt(alpha_bar_t)
# 方差项
sigma = eta * torch.sqrt(
(1 - alpha_bar_prev) / (1 - alpha_bar_t) * (1 - alpha_bar_t / alpha_bar_prev)
)
# 确定性部分 + 可选随机性
dir_xt = torch.sqrt(1 - alpha_bar_prev - sigma**2) * eps_pred
noise = sigma * torch.randn_like(x_t) if sigma > 0 else 0
return torch.sqrt(alpha_bar_prev) * x0_pred + dir_xt + noise
# eta=0: 纯确定性ODE, 可复现
# eta=1: 等价DDPM, 随机采样
# ComfyUI的eta参数就是这个
5. 信噪比与噪声调度的深层关系
αˉt\bar\alpha_tαˉt 定义了时刻t的信噪比:SNR(t)=αˉt/(1−αˉt)\text{SNR}(t) = \bar\alpha_t / (1 - \bar\alpha_t)SNR(t)=αˉt/(1−αˉt)。噪声调度的选择直接影响去噪轨迹的质量。
#mermaid-svg-X3cVbsIhsvnZGljK{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-X3cVbsIhsvnZGljK .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-X3cVbsIhsvnZGljK .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-X3cVbsIhsvnZGljK .error-icon{fill:#552222;}#mermaid-svg-X3cVbsIhsvnZGljK .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-X3cVbsIhsvnZGljK .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-X3cVbsIhsvnZGljK .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-X3cVbsIhsvnZGljK .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-X3cVbsIhsvnZGljK .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-X3cVbsIhsvnZGljK .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-X3cVbsIhsvnZGljK .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-X3cVbsIhsvnZGljK .marker{fill:#333333;stroke:#333333;}#mermaid-svg-X3cVbsIhsvnZGljK .marker.cross{stroke:#333333;}#mermaid-svg-X3cVbsIhsvnZGljK svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-X3cVbsIhsvnZGljK p{margin:0;}#mermaid-svg-X3cVbsIhsvnZGljK .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-X3cVbsIhsvnZGljK .cluster-label text{fill:#333;}#mermaid-svg-X3cVbsIhsvnZGljK .cluster-label span{color:#333;}#mermaid-svg-X3cVbsIhsvnZGljK .cluster-label span p{background-color:transparent;}#mermaid-svg-X3cVbsIhsvnZGljK .label text,#mermaid-svg-X3cVbsIhsvnZGljK span{fill:#333;color:#333;}#mermaid-svg-X3cVbsIhsvnZGljK .node rect,#mermaid-svg-X3cVbsIhsvnZGljK .node circle,#mermaid-svg-X3cVbsIhsvnZGljK .node ellipse,#mermaid-svg-X3cVbsIhsvnZGljK .node polygon,#mermaid-svg-X3cVbsIhsvnZGljK .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-X3cVbsIhsvnZGljK .rough-node .label text,#mermaid-svg-X3cVbsIhsvnZGljK .node .label text,#mermaid-svg-X3cVbsIhsvnZGljK .image-shape .label,#mermaid-svg-X3cVbsIhsvnZGljK .icon-shape .label{text-anchor:middle;}#mermaid-svg-X3cVbsIhsvnZGljK .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-X3cVbsIhsvnZGljK .rough-node .label,#mermaid-svg-X3cVbsIhsvnZGljK .node .label,#mermaid-svg-X3cVbsIhsvnZGljK .image-shape .label,#mermaid-svg-X3cVbsIhsvnZGljK .icon-shape .label{text-align:center;}#mermaid-svg-X3cVbsIhsvnZGljK .node.clickable{cursor:pointer;}#mermaid-svg-X3cVbsIhsvnZGljK .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-X3cVbsIhsvnZGljK .arrowheadPath{fill:#333333;}#mermaid-svg-X3cVbsIhsvnZGljK .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-X3cVbsIhsvnZGljK .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-X3cVbsIhsvnZGljK .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-X3cVbsIhsvnZGljK .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-X3cVbsIhsvnZGljK .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-X3cVbsIhsvnZGljK .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-X3cVbsIhsvnZGljK .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-X3cVbsIhsvnZGljK .cluster text{fill:#333;}#mermaid-svg-X3cVbsIhsvnZGljK .cluster span{color:#333;}#mermaid-svg-X3cVbsIhsvnZGljK div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-X3cVbsIhsvnZGljK .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-X3cVbsIhsvnZGljK rect.text{fill:none;stroke-width:0;}#mermaid-svg-X3cVbsIhsvnZGljK .icon-shape,#mermaid-svg-X3cVbsIhsvnZGljK .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-X3cVbsIhsvnZGljK .icon-shape p,#mermaid-svg-X3cVbsIhsvnZGljK .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-X3cVbsIhsvnZGljK .icon-shape .label rect,#mermaid-svg-X3cVbsIhsvnZGljK .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-X3cVbsIhsvnZGljK .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-X3cVbsIhsvnZGljK .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-X3cVbsIhsvnZGljK :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-X3cVbsIhsvnZGljK .default>*{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-X3cVbsIhsvnZGljK .default span{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-X3cVbsIhsvnZGljK .default tspan{fill:#000000!important;} 信噪比 SNR = alpha_bar / 1 - alpha_bar
噪声调度策略
Linear: beta均匀增长
Cosine: alpha_bar = cos^2调度
早期SNR过高 细节丢失快
SNR平滑下降 细节保留好
截断调度
SD1.5: beta_1=0.00085 beta_T=0.012
避免t=0时beta=0导致数值问题
// 来源:Nichol & Dhariwal "Improved DDPM" (2021) + SD训练配置
python
# 不同噪声调度的SNR曲线对比
def linear_schedule(T=1000, beta_start=0.00085, beta_end=0.012):
betas = torch.linspace(beta_start, beta_end, T)
alpha_bar = torch.cumprod(1 - betas, dim=0)
return alpha_bar
def cosine_schedule(T=1000, s=0.008):
steps = torch.linspace(0, T, T + 1)
alpha_bar = torch.cos((steps / T + s) / (1 + s) * math.pi / 2) ** 2
alpha_bar = alpha_bar / alpha_bar[0]
return alpha_bar[1:]
# SNR(t=500) 对比:
# Linear: SNR ≈ 0.18 (信噪比极低, 半程已接近纯噪声)
# Cosine: SNR ≈ 0.85 (信噪比适中, 细节保留更好)
# Cosine调度在ImageNet上FID提升约15%
# SD1.5使用的是scaled linear (接近cosine的效果)
6. 去噪一步的几何意义:梯度上升与流形投影
单步去噪的几何意义:ϵθ\epsilon_\thetaϵθ 估计的噪声方向指向数据流形,去噪一步等价于沿分数函数方向走一步梯度上升。
#mermaid-svg-em4v146qSjZa5Ldg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-em4v146qSjZa5Ldg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-em4v146qSjZa5Ldg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-em4v146qSjZa5Ldg .error-icon{fill:#552222;}#mermaid-svg-em4v146qSjZa5Ldg .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-em4v146qSjZa5Ldg .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-em4v146qSjZa5Ldg .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-em4v146qSjZa5Ldg .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-em4v146qSjZa5Ldg .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-em4v146qSjZa5Ldg .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-em4v146qSjZa5Ldg .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-em4v146qSjZa5Ldg .marker{fill:#333333;stroke:#333333;}#mermaid-svg-em4v146qSjZa5Ldg .marker.cross{stroke:#333333;}#mermaid-svg-em4v146qSjZa5Ldg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-em4v146qSjZa5Ldg p{margin:0;}#mermaid-svg-em4v146qSjZa5Ldg .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-em4v146qSjZa5Ldg .cluster-label text{fill:#333;}#mermaid-svg-em4v146qSjZa5Ldg .cluster-label span{color:#333;}#mermaid-svg-em4v146qSjZa5Ldg .cluster-label span p{background-color:transparent;}#mermaid-svg-em4v146qSjZa5Ldg .label text,#mermaid-svg-em4v146qSjZa5Ldg span{fill:#333;color:#333;}#mermaid-svg-em4v146qSjZa5Ldg .node rect,#mermaid-svg-em4v146qSjZa5Ldg .node circle,#mermaid-svg-em4v146qSjZa5Ldg .node ellipse,#mermaid-svg-em4v146qSjZa5Ldg .node polygon,#mermaid-svg-em4v146qSjZa5Ldg .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-em4v146qSjZa5Ldg .rough-node .label text,#mermaid-svg-em4v146qSjZa5Ldg .node .label text,#mermaid-svg-em4v146qSjZa5Ldg .image-shape .label,#mermaid-svg-em4v146qSjZa5Ldg .icon-shape .label{text-anchor:middle;}#mermaid-svg-em4v146qSjZa5Ldg .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-em4v146qSjZa5Ldg .rough-node .label,#mermaid-svg-em4v146qSjZa5Ldg .node .label,#mermaid-svg-em4v146qSjZa5Ldg .image-shape .label,#mermaid-svg-em4v146qSjZa5Ldg .icon-shape .label{text-align:center;}#mermaid-svg-em4v146qSjZa5Ldg .node.clickable{cursor:pointer;}#mermaid-svg-em4v146qSjZa5Ldg .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-em4v146qSjZa5Ldg .arrowheadPath{fill:#333333;}#mermaid-svg-em4v146qSjZa5Ldg .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-em4v146qSjZa5Ldg .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-em4v146qSjZa5Ldg .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-em4v146qSjZa5Ldg .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-em4v146qSjZa5Ldg .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-em4v146qSjZa5Ldg .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-em4v146qSjZa5Ldg .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-em4v146qSjZa5Ldg .cluster text{fill:#333;}#mermaid-svg-em4v146qSjZa5Ldg .cluster span{color:#333;}#mermaid-svg-em4v146qSjZa5Ldg div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-em4v146qSjZa5Ldg .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-em4v146qSjZa5Ldg rect.text{fill:none;stroke-width:0;}#mermaid-svg-em4v146qSjZa5Ldg .icon-shape,#mermaid-svg-em4v146qSjZa5Ldg .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-em4v146qSjZa5Ldg .icon-shape p,#mermaid-svg-em4v146qSjZa5Ldg .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-em4v146qSjZa5Ldg .icon-shape .label rect,#mermaid-svg-em4v146qSjZa5Ldg .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-em4v146qSjZa5Ldg .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-em4v146qSjZa5Ldg .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-em4v146qSjZa5Ldg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-em4v146qSjZa5Ldg .default>*{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-em4v146qSjZa5Ldg .default span{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-em4v146qSjZa5Ldg .default tspan{fill:#000000!important;} 去噪一步
几何意义
epsilon_theta指向数据流形法线方向
扣除噪声 = 沿流形方向移动
注入小噪声 = Langevin探索
类比: 带噪声的梯度下降
梯度下降: 确定性沿梯度走
Langevin: 梯度 + 随机扰动 = 避免局部极小
为什么需要随机性?
确定性ODE可能陷入流形局部
随机扰动帮助探索流形邻域
多样性来源: 不同的噪声实现->不同结果
// 来源:Song & Ermon (2019) + Welling & Teh (2011)
python
# 去噪的Langevin动力学解释
def langevin_step(x_t, score_fn, step_size, noise_scale):
"""Langevin动力学一步 = 梯度上升 + 随机扰动"""
# 确定性: 沿分数函数方向走一步
grad = score_fn(x_t) # = -eps / sqrt(1-alpha_bar)
x_determ = x_t + 0.5 * step_size * grad
# 随机性: 注入噪声探索邻域
x_next = x_determ + noise_scale * torch.randn_like(x_t)
return x_next
# DDPM = 离散Langevin动力学 (每步注入噪声)
# DDIM = 确定性梯度上升 (不注入噪声)
# 两者统一于逆向SDE/ODE框架
7. 三种扩散框架:VE/VP/VPSDE
#mermaid-svg-CrXVZ2sQVwg2phrd{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-CrXVZ2sQVwg2phrd .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-CrXVZ2sQVwg2phrd .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-CrXVZ2sQVwg2phrd .error-icon{fill:#552222;}#mermaid-svg-CrXVZ2sQVwg2phrd .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-CrXVZ2sQVwg2phrd .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-CrXVZ2sQVwg2phrd .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-CrXVZ2sQVwg2phrd .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-CrXVZ2sQVwg2phrd .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-CrXVZ2sQVwg2phrd .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-CrXVZ2sQVwg2phrd .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-CrXVZ2sQVwg2phrd .marker{fill:#333333;stroke:#333333;}#mermaid-svg-CrXVZ2sQVwg2phrd .marker.cross{stroke:#333333;}#mermaid-svg-CrXVZ2sQVwg2phrd svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-CrXVZ2sQVwg2phrd p{margin:0;}#mermaid-svg-CrXVZ2sQVwg2phrd .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-CrXVZ2sQVwg2phrd .cluster-label text{fill:#333;}#mermaid-svg-CrXVZ2sQVwg2phrd .cluster-label span{color:#333;}#mermaid-svg-CrXVZ2sQVwg2phrd .cluster-label span p{background-color:transparent;}#mermaid-svg-CrXVZ2sQVwg2phrd .label text,#mermaid-svg-CrXVZ2sQVwg2phrd span{fill:#333;color:#333;}#mermaid-svg-CrXVZ2sQVwg2phrd .node rect,#mermaid-svg-CrXVZ2sQVwg2phrd .node circle,#mermaid-svg-CrXVZ2sQVwg2phrd .node ellipse,#mermaid-svg-CrXVZ2sQVwg2phrd .node polygon,#mermaid-svg-CrXVZ2sQVwg2phrd .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-CrXVZ2sQVwg2phrd .rough-node .label text,#mermaid-svg-CrXVZ2sQVwg2phrd .node .label text,#mermaid-svg-CrXVZ2sQVwg2phrd .image-shape .label,#mermaid-svg-CrXVZ2sQVwg2phrd .icon-shape .label{text-anchor:middle;}#mermaid-svg-CrXVZ2sQVwg2phrd .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-CrXVZ2sQVwg2phrd .rough-node .label,#mermaid-svg-CrXVZ2sQVwg2phrd .node .label,#mermaid-svg-CrXVZ2sQVwg2phrd .image-shape .label,#mermaid-svg-CrXVZ2sQVwg2phrd .icon-shape .label{text-align:center;}#mermaid-svg-CrXVZ2sQVwg2phrd .node.clickable{cursor:pointer;}#mermaid-svg-CrXVZ2sQVwg2phrd .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-CrXVZ2sQVwg2phrd .arrowheadPath{fill:#333333;}#mermaid-svg-CrXVZ2sQVwg2phrd .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-CrXVZ2sQVwg2phrd .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-CrXVZ2sQVwg2phrd .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-CrXVZ2sQVwg2phrd .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-CrXVZ2sQVwg2phrd .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-CrXVZ2sQVwg2phrd .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-CrXVZ2sQVwg2phrd .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-CrXVZ2sQVwg2phrd .cluster text{fill:#333;}#mermaid-svg-CrXVZ2sQVwg2phrd .cluster span{color:#333;}#mermaid-svg-CrXVZ2sQVwg2phrd div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-CrXVZ2sQVwg2phrd .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-CrXVZ2sQVwg2phrd rect.text{fill:none;stroke-width:0;}#mermaid-svg-CrXVZ2sQVwg2phrd .icon-shape,#mermaid-svg-CrXVZ2sQVwg2phrd .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-CrXVZ2sQVwg2phrd .icon-shape p,#mermaid-svg-CrXVZ2sQVwg2phrd .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-CrXVZ2sQVwg2phrd .icon-shape .label rect,#mermaid-svg-CrXVZ2sQVwg2phrd .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-CrXVZ2sQVwg2phrd .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-CrXVZ2sQVwg2phrd .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-CrXVZ2sQVwg2phrd :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-CrXVZ2sQVwg2phrd .default>*{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-CrXVZ2sQVwg2phrd .default span{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-CrXVZ2sQVwg2phrd .default tspan{fill:#000000!important;} 扩散框架
Variance Exploding SDE
Variance Preserving SDE
sub-VP SDE
前向: dx = sqrt d sigma^2 dt * dw
方差爆炸 均值不变
前向: dx = -0.5 beta_t x dt + sqrt beta_t dw
方差保持 均值衰减
前向: VP的变体 方差有界但更小
Score SDE NCSN
DDPM SD1.5 SDXL
很少单独使用
// 来源:Song et al. (2021) Table 1
python
# VE vs VP的数学对比
# VE (Variance Exploding):
# dx = sqrt(d(sigma^2)/dt) * dw
# x_t = x_0 + sigma_t * eps (均值不变, 方差爆炸)
# 逆向: 需要估计score(x_t, sigma_t)
# 典型: NCSN, Score SDE
# VP (Variance Preserving):
# dx = -0.5 * beta_t * x * dt + sqrt(beta_t) * dw
# x_t = sqrt(alpha_bar_t) * x_0 + sqrt(1-alpha_bar_t) * eps
# 方差保持: Var(x_t) = alpha_bar_t + (1-alpha_bar_t) = 1
# 典型: DDPM, SD1.5, SDXL
# SD选择VP的原因:
# 1. 方差有界, 训练更稳定
# 2. 与VAE潜空间配合(潜表示方差已归一化)
# 3. 噪声调度alpha_bar_t直观控制SNR
8. ComfyUI去噪循环:从调度到采样器的完整路径
#mermaid-svg-w8psuX8ZKCNoFVdR{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-w8psuX8ZKCNoFVdR .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-w8psuX8ZKCNoFVdR .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-w8psuX8ZKCNoFVdR .error-icon{fill:#552222;}#mermaid-svg-w8psuX8ZKCNoFVdR .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-w8psuX8ZKCNoFVdR .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-w8psuX8ZKCNoFVdR .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-w8psuX8ZKCNoFVdR .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-w8psuX8ZKCNoFVdR .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-w8psuX8ZKCNoFVdR .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-w8psuX8ZKCNoFVdR .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-w8psuX8ZKCNoFVdR .marker{fill:#333333;stroke:#333333;}#mermaid-svg-w8psuX8ZKCNoFVdR .marker.cross{stroke:#333333;}#mermaid-svg-w8psuX8ZKCNoFVdR svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-w8psuX8ZKCNoFVdR p{margin:0;}#mermaid-svg-w8psuX8ZKCNoFVdR .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-w8psuX8ZKCNoFVdR .cluster-label text{fill:#333;}#mermaid-svg-w8psuX8ZKCNoFVdR .cluster-label span{color:#333;}#mermaid-svg-w8psuX8ZKCNoFVdR .cluster-label span p{background-color:transparent;}#mermaid-svg-w8psuX8ZKCNoFVdR .label text,#mermaid-svg-w8psuX8ZKCNoFVdR span{fill:#333;color:#333;}#mermaid-svg-w8psuX8ZKCNoFVdR .node rect,#mermaid-svg-w8psuX8ZKCNoFVdR .node circle,#mermaid-svg-w8psuX8ZKCNoFVdR .node ellipse,#mermaid-svg-w8psuX8ZKCNoFVdR .node polygon,#mermaid-svg-w8psuX8ZKCNoFVdR .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-w8psuX8ZKCNoFVdR .rough-node .label text,#mermaid-svg-w8psuX8ZKCNoFVdR .node .label text,#mermaid-svg-w8psuX8ZKCNoFVdR .image-shape .label,#mermaid-svg-w8psuX8ZKCNoFVdR .icon-shape .label{text-anchor:middle;}#mermaid-svg-w8psuX8ZKCNoFVdR .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-w8psuX8ZKCNoFVdR .rough-node .label,#mermaid-svg-w8psuX8ZKCNoFVdR .node .label,#mermaid-svg-w8psuX8ZKCNoFVdR .image-shape .label,#mermaid-svg-w8psuX8ZKCNoFVdR .icon-shape .label{text-align:center;}#mermaid-svg-w8psuX8ZKCNoFVdR .node.clickable{cursor:pointer;}#mermaid-svg-w8psuX8ZKCNoFVdR .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-w8psuX8ZKCNoFVdR .arrowheadPath{fill:#333333;}#mermaid-svg-w8psuX8ZKCNoFVdR .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-w8psuX8ZKCNoFVdR .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-w8psuX8ZKCNoFVdR .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-w8psuX8ZKCNoFVdR .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-w8psuX8ZKCNoFVdR .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-w8psuX8ZKCNoFVdR .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-w8psuX8ZKCNoFVdR .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-w8psuX8ZKCNoFVdR .cluster text{fill:#333;}#mermaid-svg-w8psuX8ZKCNoFVdR .cluster span{color:#333;}#mermaid-svg-w8psuX8ZKCNoFVdR div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-w8psuX8ZKCNoFVdR .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-w8psuX8ZKCNoFVdR rect.text{fill:none;stroke-width:0;}#mermaid-svg-w8psuX8ZKCNoFVdR .icon-shape,#mermaid-svg-w8psuX8ZKCNoFVdR .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-w8psuX8ZKCNoFVdR .icon-shape p,#mermaid-svg-w8psuX8ZKCNoFVdR .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-w8psuX8ZKCNoFVdR .icon-shape .label rect,#mermaid-svg-w8psuX8ZKCNoFVdR .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-w8psuX8ZKCNoFVdR .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-w8psuX8ZKCNoFVdR .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-w8psuX8ZKCNoFVdR :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-w8psuX8ZKCNoFVdR .default>*{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-w8psuX8ZKCNoFVdR .default span{fill:#faf9f5!important;stroke:#ffffff!important;color:#000000!important;stroke-width:0px!important;}#mermaid-svg-w8psuX8ZKCNoFVdR .default tspan{fill:#000000!important;} 随机噪声 x_T
Scheduler: 计算timesteps
去噪循环
UNet: eps_theta x_t t c
CFG: eps_u + w * eps_c - eps_u
Sampler: DDPM or DDIM Step
x_t-1
x_0_hat
VAE Decode
// 来源:ComfyUI / comfy/samplers.py + comfy/k_diffusion/sampling.py
python
# ComfyUI去噪循环的简化实现
def denoise_loop(model, scheduler, x_T, cond, uncond, cfg_scale, steps):
"""完整的去噪循环"""
timesteps = scheduler.get_timesteps(steps) # 噪声调度
x_t = x_T
for t in timesteps:
# 1. UNet预测噪声 (CFG: 两次前向)
eps_c = model(x_t, t, **cond)
eps_u = model(x_t, t, **uncond)
eps_cfg = eps_u + cfg_scale * (eps_c - eps_u)
# 2. 采样器计算x_{t-1}
x_t = scheduler.step(eps_cfg, t, x_t)
return x_t # x_0_hat
# scheduler.step的选择:
# DDPM: 随机性, 每步注入噪声, 需1000步
# DDIM: 确定性, eta=0, 可20步完成
# DPM++: 高阶ODE求解器, 10-20步高质量
# Euler: 一阶ODE, 最简单最快
总结
扩散去噪的底层原理:前向过程是按噪声调度 αˉt\bar\alpha_tαˉt 加噪的马尔可夫链(定义)→ 逆向过程是Anderson定理给出的逆向SDE(推导)→ 分数函数与噪声预测等价 score=−ϵ/1−αˉt\text{score} = -\epsilon/\sqrt{1-\bar\alpha_t}score=−ϵ/1−αˉt (桥梁)→ DDPM是离散Langevin动力学(随机采样)→ DDIM是逆向ODE(确定性映射)→ 噪声调度控制SNR曲线形状(调度)→ VP框架保持方差有界(SD的选择)。核心洞察:去噪不是"擦除噪声",而是沿分数函数方向做Langevin随机游走回到数据流形------随机性不是缺陷,而是探索流形邻域的必要机制。