接论文阅读笔记:Denoising Diffusion Probabilistic Models (1)
3、论文推理过程
扩散模型的流程如下图所示,可以看出 q ( x 0 , 1 , 2 ⋯ , T − 1 , T ) q(x^{0,1,2\cdots ,T-1, T}) q(x0,1,2⋯,T−1,T)为正向加噪音过程, p ( x 0 , 1 , 2 ⋯ , T − 1 , T ) p(x^{0,1,2\cdots ,T-1, T}) p(x0,1,2⋯,T−1,T)为逆向去噪音过程。可以看出,逆向去噪的末端得到的图上还散布一些噪点。

3.1、名词解释
q ( x 0 ) q(x^0) q(x0): x 0 x^0 x0 表示数据集的图像分布,例如在使用MNIST数据集时, x 0 x^0 x0就表示MNIST数据集中的图像,而 q ( x 0 ) q(x^0) q(x0)就表示数据集MNIST中数据集的分布情况。
p ( x T ) p(x^T) p(xT): x T x^T xT表示 x 0 x^0 x0的加噪结果, x T x^T xT是逆向去噪的起点,因此 p ( x T ) p(x^T) p(xT)是去噪起点的分布情况。
3.2、推理过程
正向加噪过程满足马尔可夫性质,因此有公式1。
q ( x 0 , 1 , 2 ⋯ , T − 1 , T ) = q ( x 0 ) ⋅ ∏ t = 1 T q ( x t ∣ x t − 1 ) = q ( x 0 ) ⋅ q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ... q ( x T ∣ x T − 1 ) . q ( x 1 , 2 ⋯ T ∣ x 0 ) = q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ... q ( x T ∣ x T − 1 ) ) . \begin{equation} \begin{split} q(x^{0,1,2\cdots,T-1,T})&=q(x^0)\cdot \prod_{t=1}^{T}{q(x^t|x^{t-1})} \\ &=q(x^0)\cdot q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1}). \\ q(x^{1,2 \cdots T}|x^0)&=q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})). \end{split} \end{equation} q(x0,1,2⋯,T−1,T)q(x1,2⋯T∣x0)=q(x0)⋅t=1∏Tq(xt∣xt−1)=q(x0)⋅q(x1∣x0)⋅q(x2∣x1)...q(xT∣xT−1).=q(x1∣x0)⋅q(x2∣x1)...q(xT∣xT−1)).
逆向去噪过程如公式2。
p θ ( x 0 , 1 , 2 ⋯ , T − 1 , T ) = p θ ( x T ) ⋅ ∏ t = 1 T p θ ( x t − 1 ∣ x t ) = p θ ( x T ) ⋅ p θ ( x T − 1 ∣ x T ) ⋅ p θ ( x T − 2 ∣ x T − 1 ) ... p θ ( x 0 ∣ x 1 ) . \begin{equation} \begin{split} p_{\theta}(x^{0,1,2\cdots,T-1,T})&=p_{\theta}(x^T)\cdot \prod_{t=1}^{T}{p_{\theta}(x^{t-1}|x^{t})} \\ &=p_{\theta}(x^T)\cdot p_{\theta}(x^{T-1}|x^T)\cdot p_{\theta}(x^{T-2}|x^{T-1})\dots p_{\theta}(x^{0}|x^{1}). \end{split} \end{equation} pθ(x0,1,2⋯,T−1,T)=pθ(xT)⋅t=1∏Tpθ(xt−1∣xt)=pθ(xT)⋅pθ(xT−1∣xT)⋅pθ(xT−2∣xT−1)...pθ(x0∣x1).
公式2中的参数 θ \theta θ就是深度学习模型中需要学习的参数。为了方便,省略公式2中的 θ \theta θ,因此公式2被重写为公式3。
p ( x 0 , 1 , 2 ⋯ , T − 1 , T ) = p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) = p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) ... p ( x 0 ∣ x 1 ) \begin{equation} \begin{split} p(x^{0,1,2\cdots,T-1,T})&=p(x^T)\cdot \prod_{t=1}^{T}{p(x^{t-1}|x^{t})} \\ &=p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1}) \end{split} \end{equation} p(x0,1,2⋯,T−1,T)=p(xT)⋅t=1∏Tp(xt−1∣xt)=p(xT)⋅p(xT−1∣xT)⋅p(xT−2∣xT−1)...p(x0∣x1)
逆向去噪的目标是使得其终点与正向加噪的起点相同。也就是使得 p ( x 0 ) p(x^0) p(x0)最大,即使得 逆向去噪过程为 x 0 x^0 x0的概率最大。
p ( x 0 ) = ∫ p ( x 0 , x 1 ) d x 1 ( 联合分布概率公式 ) = ∫ p ( x 1 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 贝叶斯概率公式 ) = ∫ ( ∫ p ( x 1 , x 2 ) d x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 ( 积分套积分 ) = ∬ p ( x 2 ) ⋅ p ( x 1 ∣ x 2 ) ⋅ p ( x 0 ∣ x 1 ) d x 1 d x 2 ( 改写为二重积分 ) = ⋮ = ∫ ∫ ⋯ ∫ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x − 1 ) ⋯ p ( x 0 ∣ x 1 ) ⋅ d x 1 d x 2 ⋯ d x T = ∫ p ( x 0 , 1 , 2 ⋯ T ) d x 1 , 2 ⋯ T ( T − 1 重积分 ) = ∫ d x 1 , 2 ⋯ T ⋅ p ( x 0 , 1 , 2 ⋯ T ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x 1 , 2 ⋯ T ∣ x 0 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x 0 , 1 , 2 ⋯ T ) q ( x 1 , 2 ⋯ T ∣ x 0 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) ... p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ... q ( x T ∣ x T − 1 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ p ( x T − 1 ∣ x T ) ⋅ p ( x T − 2 ∣ x T − 1 ) ... p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ⋅ q ( x 2 ∣ x 1 ) ... q ( x T ∣ x T − 1 ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ( 改写为期望的形式 ) \begin{equation} \begin{split} p(x^0)&=\int p(x^0,x^1)dx^{1} (联合分布概率公式)\\ &=\int p(x^1)\cdot p(x^0|x^1)dx^1 (贝叶斯概率公式) \\ &=\int \Big(\int p(x^1,x^2)dx^2 \Big) \cdot p(x^0|x^1)dx^1 (积分套积分)\\ &=\iint p(x^2)\cdot p(x^1|x^2) \cdot p(x^0|x^1)dx^1 dx^2(改写为二重积分)\\ &= \vdots \\ &= \int \int \cdots \int p(x^T)\cdot p(x^{T-1}|x^{T})\cdot p(x^{T-2}|x^{-1})\cdots p(x^0|x^1) \cdot dx^1 dx^2 \cdots dx^T \\ &= \int p(x^{0,1,2 \cdots T})dx^{1,2\cdots T} (T-1重积分) \\ &= \int dx^{1,2\cdots T} \cdot p(x^{0,1,2 \cdots T}) \cdot \frac{q(x^{1,2 \cdots T}| x^0)}{q(x^{1,2 \cdots T}|x^0)} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^{0,1,2 \cdots T}) }{q(x^{1,2 \cdots T}|x^0)} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot \frac{ p(x^T)\cdot p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \frac{ p(x^{T-1}|x^T)\cdot p(x^{T-2}|x^{T-1})\dots p(x^{0}|x^{1})}{q(x^1|x^0)\cdot q(x^2|x^1)\dots q(x^T|x^{T-1})} \\ &= \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} \\ &= E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} p(x^T)\cdot \prod_{t=1}^{T} \frac{ p(x^{t-1}|x^t)}{q(x^t|x^{t-1})} (改写为期望的形式)\\ \end{split} \end{equation} p(x0)=∫p(x0,x1)dx1(联合分布概率公式)=∫p(x1)⋅p(x0∣x1)dx1(贝叶斯概率公式)=∫(∫p(x1,x2)dx2)⋅p(x0∣x1)dx1(积分套积分)=∬p(x2)⋅p(x1∣x2)⋅p(x0∣x1)dx1dx2(改写为二重积分)=⋮=∫∫⋯∫p(xT)⋅p(xT−1∣xT)⋅p(xT−2∣x−1)⋯p(x0∣x1)⋅dx1dx2⋯dxT=∫p(x0,1,2⋯T)dx1,2⋯T(T−1重积分)=∫dx1,2⋯T⋅p(x0,1,2⋯T)⋅q(x1,2⋯T∣x0)q(x1,2⋯T∣x0)=∫dx1,2⋯T⋅q(x1,2⋯T∣x0)⋅q(x1,2⋯T∣x0)p(x0,1,2⋯T)=∫dx1,2⋯T⋅q(x1,2⋯T∣x0)⋅q(x1∣x0)⋅q(x2∣x1)...q(xT∣xT−1)p(xT)⋅p(xT−1∣xT)⋅p(xT−2∣xT−1)...p(x0∣x1)=∫dx1,2⋯T⋅q(x1,2⋯T∣x0)⋅p(xT)⋅q(x1∣x0)⋅q(x2∣x1)...q(xT∣xT−1)p(xT−1∣xT)⋅p(xT−2∣xT−1)...p(x0∣x1)=∫dx1,2⋯T⋅q(x1,2⋯T∣x0)⋅p(xT)⋅t=1∏Tq(xt∣xt−1)p(xt−1∣xt)=Ex1,2,⋯T∼q(x1,2⋯T∣x0)p(xT)⋅t=1∏Tq(xt∣xt−1)p(xt−1∣xt)(改写为期望的形式)
因此公式3中的参数 θ \theta θ应满足
θ = a r g max θ p θ ( x 0 ) . \begin{equation} \theta= arg \underset {\theta}{\text{max}} p_{\theta}(x^0). \end{equation} θ=argθmaxpθ(x0).
公式4是对数据集中的一张图片进行求解,然而数据集中通常是有成千上万张图像的。假设数据集中有 N N N张图像,因此有公式6,其目的是求得一组参数 θ \theta θ,使得 L L L取得最大值。值得注意的是 q ( x 0 ) q(x^0) q(x0)表示数据集中每张图片被采样出来的概率。
为了防止边缘效应,在本文中令 p ( x 1 ∣ x 0 ) = q ( x 1 ∣ x 0 ) p(x^1|x^{0})=q(x^1|x^{0}) p(x1∣x0)=q(x1∣x0).
L : = − l o g p ( x 0 ) = − l o g E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ≤ − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) ⋅ ∏ t = 1 T p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + ∑ t = 1 T l o g p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + l o g p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ⏟ p ( x 1 ∣ x 0 ) = q ( x 1 ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 , x 0 ) ⏟ q ( x t ∣ x t − 1 ) = q ( x t ∣ x t − 1 , x 0 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t , x t − 1 , x 0 ) ⋅ q ( x t − 1 , x 0 ) ⋅ q ( x 0 ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t , x 0 ) ⏟ q ( x t ∣ x t − 1 , x 0 ) = q ( x t , x t − 1 , x 0 ) q ( x t − 1 , x 0 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 , x 0 ) q ( x 0 ) ⋅ q ( x 0 ) q ( x t , x 0 ) ⏟ q ( x t , x t − 1 , x 0 ) = q ( x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ⏟ q ( x t − 1 , x 0 ) = q ( x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) ; q ( x t , x 0 ) = q ( x 0 ) ⋅ q ( x t ∣ x 0 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) + ∑ t = 2 T l o g q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) + l o g q ( x 1 ∣ x 0 ) q ( x 2 ∣ x 0 ) ⋅ q ( x 2 ∣ x 0 ) q ( x 3 ∣ x 0 ) ⋯ q ( x T − 1 ∣ x 0 ) q ( x T ∣ x 0 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) + l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) + l o g q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) q ( x T ∣ x 0 ) + ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) + l o g p ( x 1 ∣ x 0 ) ⏟ l o g p ( x T ) + l o g \[ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] + l o g q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) = l o g p ( x T ) ⋅ p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ⋅ q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) ) = − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x T ) q ( x T ∣ x 0 ) ) − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( ∑ t = 2 T l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ) − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x 0 ∣ x 1 ) ) = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g q ( x T ∣ x 0 ) p ( x T ) ) + E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( ∑ t = 2 T l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) − E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g p ( x 0 ∣ x 1 ) ) = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g q ( x T ∣ x 0 ) p ( x T ) ) ⏟ L 1 + E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( ∑ t = 2 T l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) ⏟ L 2 − l o g p ( x 0 ∣ x 1 ) ⏟ L 3 : 常数么 ? \begin{equation} \begin{split} L&:=- log\Bigp(x\^0)\\Big \\ &= -log \Big E_{x\^{1,2, \\cdots T} \\sim q(x\^{1,2 \\cdots T} \| x\^0)} p(x\^T)\\cdot \\prod_{t=1}\^{T} \\frac{ p(x\^{t-1}\|x\^t)}{q(x\^t\|x\^{t-1})}\\Big \\ & \leq -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \bigg( log p(x\^T)\\cdot \\prod_{t=1}\^{T} \\frac{ p(x\^{t-1}\|x\^t)}{q(x\^t\|x\^{t-1})}\bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \bigg( log p(x\^T)+\sum_{t=1}^{T} log \Big \\frac{ p(x\^{t-1}\|x\^t)}{q(x\^t\|x\^{t-1})}\\Big\bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \bigg( log p(x\^T)+ log\Big\\frac{ p(x\^{0}\|x\^1)}{q(x\^1\|x\^{0})} \\Big+\sum_{t=2}^{T} log \Big \\frac{ p(x\^{t-1}\|x\^t)}{q(x\^t\|x\^{t-1})}\\Big \bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \bigg( log p(x\^T)+ log\Big\\frac{ p(x\^{0}\|x\^1)}{\\underbrace{ p(x\^1\|x\^{0})}_{p(x\^1\|x\^{0})=q(x\^1\|x\^{0})}} \\Big+\sum_{t=2}^{T} log \Big\\underbrace{ \\frac{ p(x\^{t-1}\|x\^t)}{q(x\^t\|x\^{t-1},x\^0)}}_{q(x\^t\|x\^{t-1})=q(x\^t\|x\^{t-1},x\^0)}\\Big \bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log p(x\^T)+ log\Big\\frac{ p(x\^{0}\|x\^1)}{p(x\^1\|x\^{0})} \\Big+\sum_{t=2}^{T} log \Big\\underbrace{ \\frac{ p(x\^{t-1}\|x\^t)}{q(x\^t,x\^{t-1},x\^0)} \\cdot q(x\^{t-1}, x\^0) \\cdot \\frac{q(x\^0)}{q(x\^0)}\\cdot \\frac{q(x\^t,x\^0)}{q(x\^t,x\^0)}}_{ q(x\^t\|x\^{t-1},x\^0)=\\frac{q(x\^t,x\^{t-1},x\^0)}{q(x\^{t-1},x\^0)}}\\Big \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log p(x\^T)+ log\Big\\frac{ p(x\^{0}\|x\^1)}{p(x\^1\|x\^{0})} \\Big+\sum_{t=2}^{T} log \Big\\underbrace{ \\frac{ p(x\^{t-1}\|x\^t)}{q(x\^{t-1}\|x\^t,x\^0)} \\cdot \\frac{q(x\^{t-1}, x\^0) }{q(x\^0)}\\cdot \\frac{ q(x\^0)}{q(x\^t,x\^0)}}_{q(x\^t,x\^{t-1},x\^0)= q(x\^t,x\^0) \\cdot q(x\^{t-1}\|x\^t,x\^0)}\\Big \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log p(x\^T)+ log\Big\\frac{ p(x\^{0}\|x\^1)}{p(x\^1\|x\^{0})} \\Big+\sum_{t=2}^{T} log \Big\\underbrace{ \\frac{ p(x\^{t-1}\|x\^t)}{q(x\^{t-1}\|x\^t,x\^0)} \\cdot \\frac{q(x\^{t-1}\| x\^0) }{q(x\^{t}\|x\^0)}}_{q(x\^{t-1},x\^0)=q(x\^0) \\cdot q(x\^{t-1}\|x\^0) ; q(x\^{t},x\^0)=q(x\^0) \\cdot q(x\^{t}\|x\^0)}\\Big \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log p(x\^T)+ log\Big\\frac{ p(x\^{0}\|x\^1)}{p(x\^1\|x\^{0})} \\Big+\sum_{t=2}^{T} log \Big\\frac{ p(x\^{t-1}\|x\^t)}{q(x\^{t-1}\|x\^t,x\^0)} \\Big + \sum_{t=2}^{T} log \Big\\frac{q(x\^{t-1}\| x\^0) }{q(x\^{t}\|x\^0)}\\Big \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log p(x\^T)+ log\Big\\frac{ p(x\^{0}\|x\^1)}{p(x\^1\|x\^{0})} \\Big+\sum_{t=2}^{T} log \Big\\frac{ p(x\^{t-1}\|x\^t)}{q(x\^{t-1}\|x\^t,x\^0)} \\Big + log \Big\\frac{q(x\^{1}\| x\^0) }{q(x\^{2}\|x\^0)} \\cdot \\frac{q(x\^{2}\| x\^0) }{q(x\^{3}\|x\^0)}\\cdots \\frac{q(x\^{T-1}\| x\^0) }{q(x\^{T}\|x\^0)}\\Big \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log p(x\^T)+ log\Big\\frac{ p(x\^{0}\|x\^1)}{p(x\^1\|x\^{0})} \\Big+\sum_{t=2}^{T} log \Big\\frac{ p(x\^{t-1}\|x\^t)}{q(x\^{t-1}\|x\^t,x\^0)} \\Big + log \Big\\frac{q(x\^{1}\| x\^0) }{q(x\^{T}\|x\^0)}\\Big \Bigg)\\ &= -E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\underbrace{log \Big\\frac{p(x\^T)}{q(x\^{T}\|x\^0)}\\Big+\sum_{t=2}^{T} log \Big\\frac{ p(x\^{t-1}\|x\^t)}{q(x\^{t-1}\|x\^t,x\^0)} \\Big + log \Bigp(x\^{1}\|x\^0)\\Big }{log p(x\^T)+log\\Big\[\\frac{ p(x\^{0}\|x\^1)}{p(x\^1\|x\^{0})} \\Big]+ log \Big\\frac{q(x\^{1}\| x\^0) }{q(x\^{T}\|x\^0)}\\Big=log\biggp(x\^T) \\cdot \\frac{ p(x\^{0}\|x\^1)}{\\bcancel{p(x\^1\|x\^{0})}} \\cdot \\frac{\\bcancel{q(x\^{1}\| x\^0) }}{q(x\^{T}\|x\^0)} \\bigg}\Bigg)\\ &= -E{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big \\frac{ p(x\^T)}{q(x\^{T}\|x\^0)}\\Big\Bigg)-E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\sum_{t=2}^{T} log \Big\\frac{ p(x\^{t-1}\|x\^t)}{q(x\^{t-1}\|x\^t,x\^0)} \\Big\Bigg) - E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log \Bigp(x\^{0}\|x\^1)\\Big \Bigg)\\ &= E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)}\\Big\Bigg)+E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\sum_{t=2}^{T} log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big\Bigg) - E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg( log \Bigp(x\^{0}\|x\^1)\\Big \Bigg)\\ &= \underbrace{E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)}\\Big\Bigg)}{L_1}+\underbrace{E{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\sum_{t=2}^{T} log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big\Bigg)}{L_2} - \underbrace{log \Bigp(x\^{0}\|x\^1)\\Big}{L_3:常数么?} \\ \end{split} \end{equation} L:=−logp(x0)=−logEx1,2,⋯T∼q(x1,2⋯T∣x0)p(xT)⋅t=1∏Tq(xt∣xt−1)p(xt−1∣xt)≤−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)⋅t=1∏Tq(xt∣xt−1)p(xt−1∣xt))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+t=1∑Tlogq(xt∣xt−1)p(xt−1∣xt))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+logq(x1∣x0)p(x0∣x1)+t=2∑Tlogq(xt∣xt−1)p(xt−1∣xt))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+logp(x1∣x0)=q(x1∣x0) p(x1∣x0)p(x0∣x1)+t=2∑Tlogq(xt∣xt−1)=q(xt∣xt−1,x0) q(xt∣xt−1,x0)p(xt−1∣xt))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+logp(x1∣x0)p(x0∣x1)+t=2∑Tlogq(xt∣xt−1,x0)=q(xt−1,x0)q(xt,xt−1,x0) q(xt,xt−1,x0)p(xt−1∣xt)⋅q(xt−1,x0)⋅q(x0)q(x0)⋅q(xt,x0)q(xt,x0))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+logp(x1∣x0)p(x0∣x1)+t=2∑Tlogq(xt,xt−1,x0)=q(xt,x0)⋅q(xt−1∣xt,x0) q(xt−1∣xt,x0)p(xt−1∣xt)⋅q(x0)q(xt−1,x0)⋅q(xt,x0)q(x0))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+logp(x1∣x0)p(x0∣x1)+t=2∑Tlogq(xt−1,x0)=q(x0)⋅q(xt−1∣x0);q(xt,x0)=q(x0)⋅q(xt∣x0) q(xt−1∣xt,x0)p(xt−1∣xt)⋅q(xt∣x0)q(xt−1∣x0))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+logp(x1∣x0)p(x0∣x1)+t=2∑Tlogq(xt−1∣xt,x0)p(xt−1∣xt)+t=2∑Tlogq(xt∣x0)q(xt−1∣x0))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+logp(x1∣x0)p(x0∣x1)+t=2∑Tlogq(xt−1∣xt,x0)p(xt−1∣xt)+logq(x2∣x0)q(x1∣x0)⋅q(x3∣x0)q(x2∣x0)⋯q(xT∣x0)q(xT−1∣x0))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+logp(x1∣x0)p(x0∣x1)+t=2∑Tlogq(xt−1∣xt,x0)p(xt−1∣xt)+logq(xT∣x0)q(x1∣x0))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)+log\[p(x1∣x0)p(x0∣x1)]+logq(xT∣x0)q(x1∣x0)=logp(xT)⋅p(x1∣x0) p(x0∣x1)⋅q(xT∣x0)q(x1∣x0) logq(xT∣x0)p(xT)+t=2∑Tlogq(xt−1∣xt,x0)p(xt−1∣xt)+logp(x1∣x0))=−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logq(xT∣x0)p(xT))−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(t=2∑Tlogq(xt−1∣xt,x0)p(xt−1∣xt))−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(x0∣x1))=Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)q(xT∣x0))+Ex1,2,⋯T∼q(x1,2⋯T∣x0)(t=2∑Tlogp(xt−1∣xt)q(xt−1∣xt,x0))−Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(x0∣x1))=L1 Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)q(xT∣x0))+L2 Ex1,2,⋯T∼q(x1,2⋯T∣x0)(t=2∑Tlogp(xt−1∣xt)q(xt−1∣xt,x0))−L3:常数么? logp(x0∣x1)
可以看出 L L L总共氛围了3项,首先考虑第一项 L 1 L_1 L1。
L 1 = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g q ( x T ∣ x 0 ) p ( x T ) ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g q ( x T ∣ x 0 ) p ( x T ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x T ∣ x 0 ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g q ( x T ∣ x 0 ) p ( x T ) = ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T − 1 ∣ x 0 , x T ) ⏟ q ( x 1 , 2 ⋯ T ∣ x 0 ) = q ( x T ∣ x 0 ) ⋅ q ( x 1 , 2 ⋯ T − 1 ∣ x 0 , x T ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g q ( x T ∣ x 0 ) p ( x T ) = ∫ ( ∫ q ( x 1 , 2 ⋯ T − 1 ∣ x 0 , x T ) ⋅ ∏ k = 1 T − 1 d x k ⏟ 二重积分化为两个定积分相乘,并且 = 1 ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g q ( x T ∣ x 0 ) p ( x T ) ⋅ d x T = ∫ q ( x T ∣ x 0 ) ⋅ l o g q ( x T ∣ x 0 ) p ( x T ) ⋅ d x T = E x T ∼ q ( x T ∣ x 0 ) l o g q ( x T ∣ x 0 ) p ( x T ) = K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) \begin{equation} \begin{split} L_1&=E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)}\\Big\Bigg) \\ &=\int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)}\\Big \\ &=\int dx^{1,2\cdots T} \cdot \frac{q(x^{1,2 \cdots T}| x^0)}{q(x^T|x^0)} \cdot q(x^T|x^0) \cdot log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)}\\Big \\ &=\int dx^{1,2\cdots T} \cdot \underbrace{ q(x^{1,2 \cdots T-1}| x^0, x^T) }{q(x^{1,2 \cdots T}| x^0)=q(x^{T}|x^0) \cdot q(x^{1,2 \cdots T-1}| x^0, x^T)} \cdot q(x^T|x^0) \cdot log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)}\\Big \\ &=\int \Bigg( \underbrace{ \int q(x^{1,2 \cdots T-1}| x^0, x^T) \cdot \prod{k=1}^{T-1} dx^k }{二重积分化为两个定积分相乘,并且=1} \Bigg) \cdot q(x^T|x^0) \cdot log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)} \\Big \cdot dx^{T} \\ &=\int q(x^T|x^0) \cdot log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)} \\Big \cdot dx^{T} \\ &=E{x^T\sim q(x^T|x^0)} log \Big \\frac{q(x\^{T}\|x\^0)}{ p(x\^T)} \\Big\\ &= KL\Big(q(x^T|x^0)||p(x^T)\Big) \end{split} \end{equation} L1=Ex1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xT)q(xT∣x0))=∫dx1,2⋯T⋅q(x1,2⋯T∣x0)⋅logp(xT)q(xT∣x0)=∫dx1,2⋯T⋅q(xT∣x0)q(x1,2⋯T∣x0)⋅q(xT∣x0)⋅logp(xT)q(xT∣x0)=∫dx1,2⋯T⋅q(x1,2⋯T∣x0)=q(xT∣x0)⋅q(x1,2⋯T−1∣x0,xT) q(x1,2⋯T−1∣x0,xT)⋅q(xT∣x0)⋅logp(xT)q(xT∣x0)=∫(二重积分化为两个定积分相乘,并且=1 ∫q(x1,2⋯T−1∣x0,xT)⋅k=1∏T−1dxk)⋅q(xT∣x0)⋅logp(xT)q(xT∣x0)⋅dxT=∫q(xT∣x0)⋅logp(xT)q(xT∣x0)⋅dxT=ExT∼q(xT∣x0)logp(xT)q(xT∣x0)=KL(q(xT∣x0)∣∣p(xT))
接着考虑第二项 L 2 L_2 L2。
L 2 = E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( ∑ t = 2 T l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) = ∑ t = 2 T E x 1 , 2 , ⋯ T ∼ q ( x 1 , 2 ⋯ T ∣ x 0 ) ( l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) = ∑ t = 2 T ( ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) = ∑ t = 2 T ( ∫ d x 1 , 2 ⋯ T ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) = ∑ t = 2 T ( ∫ d x 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) q ( x 0 ) ⏟ q ( x 0 , 1 , 2 ⋯ T ) = q ( x 0 ) ⋅ q ( x 1 , 2 ⋯ T ∣ x 0 ) ⋅ q ( x t , x 0 ) q ( x t , x t − 1 , x 0 ) ⏟ q ( x t , x t − 1 , x 0 ) = q ( x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) = ∑ t = 2 T ( ∫ d x 1 , 2 ⋯ T ⋅ q ( x 0 , 1 , 2 ⋯ T ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t − 1 , x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) = ∑ t = 2 T ( ∫ ∫ q ( x 0 , 1 , 2 ⋯ T ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t − 1 , x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ) = ∑ t = 2 T ( ∫ ∫ q ( x 0 , 1 , 2 ⋯ T ) q ( x t − 1 , x 0 ) ⋅ q ( x t , x 0 ) q ( x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ) = ∑ t = 2 T ( ∫ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⏟ q ( x 0 ; T ) = q ( x t − 1 , x 0 ) ⋅ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ q ( x t ∣ x 0 ) q ( x t ∣ x t − 1 , x 0 ) ⏟ q ( x t , x 0 ) = q ( x 0 ) ⋅ q ( x t ∣ x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ) = ∑ t = 2 T ( ∫ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ q ( x t ∣ x 0 ) q ( x t ∣ x t − 1 , x 0 ) ⏟ = 1 ∏ k ≥ 1 , k ≠ t − 1 d x k ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ) = ∑ t = 2 T ( ∫ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ ∏ k ≥ 1 , k ≠ t − 1 d x k ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ) = ∑ t = 2 T ( ∫ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ ∏ k ≥ 1 , k ≠ t − 1 d x k ⏟ = 1 ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ) = ∑ t = 2 T ( ∫ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 ) = ∑ t = 2 T ( E x t − 1 ∼ q ( x t − 1 ∣ x t , x 0 ) l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ) = ∑ t = 2 T K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p ( x t − 1 ∣ x t ) ) \begin{equation} \begin{split} L_2&=E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(\sum_{t=2}^{T} log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big\Bigg)\\ &=\sum_{t=2}^{T} E_{x^{1,2, \cdots T} \sim q(x^{1,2 \cdots T} | x^0)} \Bigg(log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big\Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx^{1,2\cdots T} \cdot q(x^{1,2 \cdots T}| x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx^{1,2\cdots T} \cdot \frac{ q(x^{1,2 \cdots T}| x^0)}{q(x^{t-1}|x^t,x^0)} \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx^{1,2\cdots T} \cdot \underbrace{ \frac{q(x^{0,1,2\cdots T})}{q(x^0)}}{q(x^{0,1,2\cdots T})=q(x^0)\cdot q(x^{1,2 \cdots T}| x^0)} \cdot \underbrace{ \frac{q(x^t,x^0)}{q(x^t,x^{t-1},x^0)}}{q(x^t,x^{t-1},x^0)=q(x^t,x^0)\cdot q(x^{t-1}|x^t,x^0)} \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx^{1,2\cdots T} \cdot \frac{q(x^{0,1,2\cdots T})}{q(x^0)}\cdot \frac{q(x^t,x^0)}{q(x^{t-1},x^0)\cdot q(x^t|x^{t-1},x^0)} \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg \\int \\frac{q(x\^{0,1,2\\cdots T})}{q(x\^0)}\\cdot \\frac{q(x\^t,x\^0)}{q(x\^{t-1},x\^0)\\cdot q(x\^t\|x\^{t-1},x\^0)} \\prod_{k\\geq1 ,k\\neq t-1} dx\^k \\bigg \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} dx\^{t-1} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg \\int \\frac{q(x\^{0,1,2\\cdots T})}{q(x\^{t-1},x\^0)}\\cdot \\frac{q(x\^t,x\^0)}{q(x\^0)\\cdot q(x\^t\|x\^{t-1},x\^0)} \\prod_{k\\geq1 ,k\\neq t-1} dx\^k \\bigg \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} dx\^{t-1} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg \\underbrace{ \\int q(x\^{k:k\\geq1,k\\neq t-1}\|x\^{t-1},x\^0)}_{q(x\^{0;T})=q(x\^{t-1},x\^0)\\cdot q(x\^{k:k\\geq1,k\\neq t-1}\|x\^{t-1},x\^0)} \\cdot \\underbrace {\\frac{q(x\^t\|x\^0)}{ q(x\^t\|x\^{t-1},x\^0)}}_{q(x\^t,x\^0)=q(x\^0)\\cdot q(x\^t\|x\^0)} \\prod_{k\\geq1 ,k\\neq t-1} dx\^k \\bigg \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} dx\^{t-1} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg\\int q(x\^{k:k\\geq1,k\\neq t-1}\|x\^{t-1},x\^0)\\cdot \\underbrace {\\frac{q(x\^t\|x\^0)}{ q(x\^t\|x\^{t-1},x\^0)}}_{=1} \\prod_{k\\geq1 ,k\\neq t-1} dx\^k \\bigg \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} dx\^{t-1} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg\\int q(x\^{k:k\\geq1,k\\neq t-1}\|x\^{t-1},x\^0)\\cdot \\prod_{k\\geq1 ,k\\neq t-1} dx\^k \\bigg \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} dx\^{t-1} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg\\underbrace{ \\int q(x\^{k:k\\geq1,k\\neq t-1}\|x\^{t-1},x\^0)\\cdot \\prod_{k\\geq1 ,k\\neq t-1} dx\^k }_{=1}\\bigg \cdot q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} dx\^{t-1} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int q(x^{t-1}|x^t,x^0) \cdot log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} dx\^{t-1} \\Big \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( E_{x^{t-1}\sim q(x^{t-1}|x^t,x^0)} log \Big\\frac{q(x\^{t-1}\|x\^t,x\^0)}{ p(x\^{t-1}\|x\^t)} \\Big \Bigg)\\ &=\sum_{t=2}^{T}KL\bigg(q(x^{t-1}|x^t,x^0)||p(x^{t-1}|x^t) \bigg) \end{split} \end{equation} L2=Ex1,2,⋯T∼q(x1,2⋯T∣x0)(t=2∑Tlogp(xt−1∣xt)q(xt−1∣xt,x0))=t=2∑TEx1,2,⋯T∼q(x1,2⋯T∣x0)(logp(xt−1∣xt)q(xt−1∣xt,x0))=t=2∑T(∫dx1,2⋯T⋅q(x1,2⋯T∣x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0))=t=2∑T(∫dx1,2⋯T⋅q(xt−1∣xt,x0)q(x1,2⋯T∣x0)⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0))=t=2∑T(∫dx1,2⋯T⋅q(x0,1,2⋯T)=q(x0)⋅q(x1,2⋯T∣x0) q(x0)q(x0,1,2⋯T)⋅q(xt,xt−1,x0)=q(xt,x0)⋅q(xt−1∣xt,x0) q(xt,xt−1,x0)q(xt,x0)⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0))=t=2∑T(∫dx1,2⋯T⋅q(x0)q(x0,1,2⋯T)⋅q(xt−1,x0)⋅q(xt∣xt−1,x0)q(xt,x0)⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0))=t=2∑T(∫∫q(x0)q(x0,1,2⋯T)⋅q(xt−1,x0)⋅q(xt∣xt−1,x0)q(xt,x0)k≥1,k=t−1∏dxk⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1)=t=2∑T(∫∫q(xt−1,x0)q(x0,1,2⋯T)⋅q(x0)⋅q(xt∣xt−1,x0)q(xt,x0)k≥1,k=t−1∏dxk⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1)=t=2∑T(∫q(x0;T)=q(xt−1,x0)⋅q(xk:k≥1,k=t−1∣xt−1,x0) ∫q(xk:k≥1,k=t−1∣xt−1,x0)⋅q(xt,x0)=q(x0)⋅q(xt∣x0) q(xt∣xt−1,x0)q(xt∣x0)k≥1,k=t−1∏dxk⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1)=t=2∑T(∫∫q(xk:k≥1,k=t−1∣xt−1,x0)⋅=1 q(xt∣xt−1,x0)q(xt∣x0)k≥1,k=t−1∏dxk⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1)=t=2∑T(∫∫q(xk:k≥1,k=t−1∣xt−1,x0)⋅k≥1,k=t−1∏dxk⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1)=t=2∑T(∫=1 ∫q(xk:k≥1,k=t−1∣xt−1,x0)⋅k≥1,k=t−1∏dxk⋅q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1)=t=2∑T(∫q(xt−1∣xt,x0)⋅logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1)=t=2∑T(Ext−1∼q(xt−1∣xt,x0)logp(xt−1∣xt)q(xt−1∣xt,x0))=t=2∑TKL(q(xt−1∣xt,x0)∣∣p(xt−1∣xt))
因此
L : = L 1 + L 2 + L 3 = K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) + ∑ t = 2 T K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p ( x t − 1 ∣ x t ) ) − l o g p ( x 0 ∣ x 1 ) \begin{equation} \begin{split} L&:=L_1+L_2+L_3 \\ &=KL\Big(q(x^T|x^0)||p(x^T)\Big) + \sum_{t=2}^{T}KL\bigg(q(x^{t-1}|x^t,x^0)||p(x^{t-1}|x^t) \bigg)-log \Bigp(x\^{0}\|x\^1)\\Big \end{split} \end{equation} L:=L1+L2+L3=KL(q(xT∣x0)∣∣p(xT))+t=2∑TKL(q(xt−1∣xt,x0)∣∣p(xt−1∣xt))−logp(x0∣x1)