VideoGPT:使用VQ-VAE和Transformers的视频生成

1 Title

VideoGPT: Video Generation using VQ-VAE and Transformers(Wilson Yan,Yunzhi Zhang ,Pieter Abbeel,Aravind Srinivas)

2 Conlusion

This paper present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings.

3 Good Sentences

1、High-fidelity natural videos is one notable modality that has not seen the same level of progress in generative modeling as compared to images, audio, and text. This is reasonable since the complexity of natural videos requires modeling correlations across both space and time with much higher input dimensions. Video modeling is therefore a natural next challenge for current deep generative models. (The significance of this work)

2、The above line of reasoning leads us to our proposed model:VideoGPT, a simple video generation architecture that is a minimal adaptation of VQ-VAE and GPT architectures for videos.(The reason for choosing VideoGPT)

3、Although the VQ-VAE is trained unconditionally, we can generate conditional samples by training a conditional prior. We use two types of conditioning:Cross Attention and Conditional Norms.(How to transform unconditional to conditional learning)


背景知识

VQ-VAE

VQ-VAE能利用codebook机制把图像编码成离散向量

Method

整个训练过程如图所示,分为两个部分,训练VQ-VAE(左)和训练隐空间中的自回归Transformer(右)

第一阶段与原始VQ-VAE训练过程类似。

第二阶段,VQ-VAE将视频数据编码为隐序列作为先验模型的训练数据。首先从先验中采样隐序列,然后使用VQ-VAE将隐序列解码为视频样本。(Transformer的作用是引入条件,这里可以使用交叉注意力或者Conditional Norms:)

相关推荐
ZZZ_O^O21 分钟前
二分查找算法——寻找旋转排序数组中的最小值&点名
数据结构·c++·学习·算法·二叉树
CV-King44 分钟前
opencv实战项目(三十):使用傅里叶变换进行图像边缘检测
人工智能·opencv·算法·计算机视觉
禁默1 小时前
2024年计算机视觉与艺术研讨会(CVA 2024)
人工智能·计算机视觉
代码雕刻家1 小时前
数据结构-3.9.栈在递归中的应用
c语言·数据结构·算法
雨中rain1 小时前
算法 | 位运算(哈希思想)
算法
春末的南方城市2 小时前
FLUX的ID保持项目也来了! 字节开源PuLID-FLUX-v0.9.0,开启一致性风格写真新纪元!
人工智能·计算机视觉·stable diffusion·aigc·图像生成
Kalika0-03 小时前
猴子吃桃-C语言
c语言·开发语言·数据结构·算法
sp_fyf_20243 小时前
计算机前沿技术-人工智能算法-大语言模型-最新研究进展-2024-10-02
人工智能·神经网络·算法·计算机视觉·语言模型·自然语言处理·数据挖掘
我是哈哈hh5 小时前
专题十_穷举vs暴搜vs深搜vs回溯vs剪枝_二叉树的深度优先搜索_算法专题详细总结
服务器·数据结构·c++·算法·机器学习·深度优先·剪枝
Tisfy5 小时前
LeetCode 2187.完成旅途的最少时间:二分查找
算法·leetcode·二分查找·题解·二分