VideoGPT:使用VQ-VAE和Transformers的视频生成

1 Title

VideoGPT: Video Generation using VQ-VAE and Transformers(Wilson Yan,Yunzhi Zhang ,Pieter Abbeel,Aravind Srinivas)

2 Conlusion

This paper present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings.

3 Good Sentences

1、High-fidelity natural videos is one notable modality that has not seen the same level of progress in generative modeling as compared to images, audio, and text. This is reasonable since the complexity of natural videos requires modeling correlations across both space and time with much higher input dimensions. Video modeling is therefore a natural next challenge for current deep generative models. (The significance of this work)

2、The above line of reasoning leads us to our proposed model:VideoGPT, a simple video generation architecture that is a minimal adaptation of VQ-VAE and GPT architectures for videos.(The reason for choosing VideoGPT)

3、Although the VQ-VAE is trained unconditionally, we can generate conditional samples by training a conditional prior. We use two types of conditioning:Cross Attention and Conditional Norms.(How to transform unconditional to conditional learning)


背景知识

VQ-VAE

VQ-VAE能利用codebook机制把图像编码成离散向量

Method

整个训练过程如图所示,分为两个部分,训练VQ-VAE(左)和训练隐空间中的自回归Transformer(右)

第一阶段与原始VQ-VAE训练过程类似。

第二阶段,VQ-VAE将视频数据编码为隐序列作为先验模型的训练数据。首先从先验中采样隐序列,然后使用VQ-VAE将隐序列解码为视频样本。(Transformer的作用是引入条件,这里可以使用交叉注意力或者Conditional Norms:)

相关推荐
独家回忆3641 小时前
每日算法-250415
算法
m0_742950551 小时前
算法堆排序记录
数据结构·算法
明月看潮生1 小时前
青少年编程与数学 02-016 Python数据结构与算法 15课题、字符串匹配
python·算法·青少年编程·编程与数学
LIUDAN'S WORLD2 小时前
YOLOv3实践教程:使用预训练模型进行目标检测
人工智能·深度学习·yolo·计算机视觉
精彩漂亮ing3 小时前
CExercise_13_1排序算法_2归并排序
算法·排序算法
小媛早点睡5 小时前
贪心算法day10(无重叠区间)
算法·贪心算法
DataFunTalk5 小时前
乐信集团副总经理周道钰亲述 :乐信“黎曼”异动归因系统的演进之路
前端·后端·算法
驼驼学编程5 小时前
目标检测与分割:深度学习在视觉中的应用
人工智能·深度学习·目标检测·计算机视觉
行走的bug...5 小时前
sklearn估计器和变换器共有的一些方法 待更新
人工智能·算法·sklearn
DataFunTalk5 小时前
开源一个MCP+数据库新玩法,网友直呼Text 2 SQL“有救了!”
前端·后端·算法