Sora后时代文生视频的探索

一、写在前面

按常理,这里应该长篇大论地介绍一下Sora发布对各行业各方面产生的影响。不过,这类文章已经很多了,我们今天主要聊聊那些已经成熟的解决方案、那些已经可以"信手拈来"的成果,并以此为基础,看看Sora发布后的时代我们能做些什么。

本文涉及的、可尝试的应用地址

1\] Stable Video Diffusion huggingface space(仅支持图生视频):[https://huggingface.co/spaces/multimodalart/stable-video-diffusion](https://huggingface.co/spaces/multimodalart/stable-video-diffusion "https://huggingface.co/spaces/multimodalart/stable-video-diffusion") \[2\] 小诺AI(支持文生视频和图生视频):微信搜索小程序-小诺AI \[3\] Pika(支持文生视频):[https://pika.art/](https://pika.art/ "https://pika.art/") \[4\] Open-Sora(支持文生视频):[GitHub - hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All](https://github.com/hpcaitech/Open-Sora "GitHub - hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All") ## 二、当前视频生成解决方案一览 最近看了很多文生视频的开源构建项目\[1\]、\[4\]以及闭源直接应用项目\[2\]、\[3\],一个直观感受是大家对Sora的应用都很期待。Sora发布至今仅一个月,已经有很多可以直接体验的文生视频产品被推出,这里选择几个比较有人气的应用介绍下。 ### pika Pika\[3\]是Sora前产物,它们勇敢地做出了自己的尝试。虽然Sora发布后对Pika的质疑声音越来越大,但毋庸置疑Pika是先行者,而且取得了不错的成绩,而且Pika也在迭代优化中,有了Sora的经验,Pika有可能在之后带给人们更大的惊喜,个人将保持期待。 这里给出一个官网的示例。 ![](https://file.jishuzhan.net/article/1770625139101143042/9383561d0e5b8d06e8c8536d1599328a.webp) Prompt:3d animation, a cute boy is standing in a house, spring festival interior, lunar new year, holiday. ### Stable Video Diffusion Stable Video Diffusion\[1\]也是Sora前产物,与Pika不同的是SVD是一个开源项目。所以,有一种观点是Sora借鉴了SVD,这种观点有一定的道理。查看Sora的report,有很多细节如DiT与SVD是一致的。而且SVD在前,说借鉴也没问题。个人觉得SVD是目前视频生成质量最高的解决方案,它能接受的输入形式是image。这里是一个示例,输入的image是一个静态的发射中的火箭,而输出的视频将整个发射过程动态化呈现。 ![](https://file.jishuzhan.net/article/1770625139101143042/bf4080e641354db6082f40376e902219.webp) 图片输入 rocket ![](https://file.jishuzhan.net/article/1770625139101143042/089b08fb64150ed224be729204da86a6.webp) 视频输出 rocket(博客不能放视频,这里是将video抽帧呈现的gif) 我们探讨一个问题,SVD的执行过程因为缺少语义的指导,所以它的任务只是将其"动态化",这种"动态化"有可能是违背物理规律的。通俗地讲,上面的火箭也有可能是水平运动的(我自己的尝试中出现过这种情况),并没有一种途径去指定image中的物体的具体轨迹,指定如何运动才是正确的,这里的运动形式是单纯地由模型在数据集中学习的。 SVD的后续工作应该是加入text,提供更清晰的语义。 ### Open-Sora Open-Sora\[4\]是Sora后产物,是完全根据Sora公布的report复现的开源项目。以下引自Open-Sora项目首页的简介。 Open-Sora项目是一项致力于高效制作高质量视频,并使所有人都能使用其模型、工具和内容的计划。 通过采用开源原则,Open-Sora 不仅实现了先进视频生成技术的低成本普及,还提供了一个精简且用户友好的方案,简化了视频制作的复杂性。 通过 Open-Sora,我们希望更多开发者一起探索内容创作领域的创新、创造和包容。Open-Sora 项目目前处在早期阶段,并将持续更新。 这里也给出一些示例,直观感受下效果。 ![](https://file.jishuzhan.net/article/1770625139101143042/54203e9379e71ac74208145533aa5bc4.webp) Prompt:A serene night scene in a forested area. The first frame shows a tranquil lake reflecting the star-filled sky above. The second frame reveals a beautiful sunset, casting a warm glow over the landscape. The third frame showcases the night sky, filled with stars and a vibrant Milky Way galaxy. The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. The style of the video is naturalistic, emphasizing the beauty of the night sky and the peacefulness of the forest. ![](https://file.jishuzhan.net/article/1770625139101143042/179ef5c700d3c66c7b576e8198c82ce9.webp) Prompt:A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea. Seabirds can be seen taking flight around the cliff's precipices. As the drone slowly moves from different angles, the changing sunlight casts shifting shadows that highlight the rugged textures of the cliff and the surrounding calm sea. The water gently laps at the rock base and the greenery that clings to the top of the cliff, and the scene gives a sense of peaceful isolation at the fringes of the ocean. The video captures the essence of pristine natural beauty untouched by human structures. ![](https://file.jishuzhan.net/article/1770625139101143042/f6dc78cd981e6e53d4587d32ce6ba644.webp) Prompt:The video captures the majestic beauty of a waterfall cascading down a cliff into a serene lake. The waterfall, with its powerful flow, is the central focus of the video. The surrounding landscape is lush and green, with trees and foliage adding to the natural beauty of the scene. The camera angle provides a bird's eye view of the waterfall, allowing viewers to appreciate the full height and grandeur of the waterfall. The video is a stunning representation of nature's power and beauty. ![](https://file.jishuzhan.net/article/1770625139101143042/e0ca12e0936fc19edc2fcece7edca69b.webp) Prompt:A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell, is the main focus of the video, swimming gracefully towards the right side of the frame. The coral reef, teeming with life, is visible in the background, providing a vibrant and colorful backdrop to the turtle's journey. Several small fish, darting around the turtle, add a sense of movement and dynamism to the scene. The video is shot from a slightly elevated angle, providing a comprehensive view of the turtle's surroundings. The overall style of the video is calm and peaceful, capturing the beauty and tranquility of the underwater world. ### 小诺AI 小诺AI\[3\]是[舒笔科技](http://www.aishubi.com/ "舒笔科技")推出的一款微信小程序,是基于SVD优化的**文生视频**产品。在SVD的基础上,小诺AI集成了支持文字Prompt输入的功能,打通了文生视频的pipeline,使用户所写即所得。 该产品目前只支持英文输入的功能,不过没关系,小诺AI同时集成了Prompt咒语生成功能,可将中文Prompt翻译。同时,如果也可以利用该功能对Prompt进行扩写。 这里是一些示例的展示。 ![](https://file.jishuzhan.net/article/1770625139101143042/d836215b36f57c26603c75a37df2423b.webp) Prompt:The sun is setting by the mountain. ![](https://file.jishuzhan.net/article/1770625139101143042/e1d873157ee7dd0209a4dff307c762e8.webp) Prompt:breathtaking selfie photograph of astronaut floating in space, earth in the background. award-winning, professional, highly detailed ![](https://file.jishuzhan.net/article/1770625139101143042/569aee34ae230543b2c029b9938d5125.webp) Prompt:breathtaking night street of city, neon lights. award-winning, professional, highly detailed ![](https://file.jishuzhan.net/article/1770625139101143042/46e7466bb980169126d3819988fd3697.webp) Prompt:anime artwork an empty classroom. anime style, key visual, vibrant, studio anime, highly detailed ![](https://file.jishuzhan.net/article/1770625139101143042/61eb1e02f26791a70e4034d9ad92723f.webp) Prompt:a beautiful room ![](https://file.jishuzhan.net/article/1770625139101143042/d24c1698af5e02252bfc3daabd582c32.webp) Prompt:anime artwork an island surrounding by the sea, dramatic, anime style, key visual, vibrant, studio anime, highly detailed ![](https://file.jishuzhan.net/article/1770625139101143042/91bd2384351991967139db51e624c6ac.webp) Prompt:concept art of a warrior with a sword, clouds. digital artwork, illustrative, painterly, matte painting, highly detailed, cinematic composition ![](https://file.jishuzhan.net/article/1770625139101143042/409b8d99c9a7b23486c0836a070a5c1c.webp) Prompt:16-bit pixel art, a cozy cafe side view, a beautiful day

相关推荐
余俊晖9 小时前
多模态大模型后训练强化学习训练方法:Shuffle-R1
人工智能·自然语言处理·多模态
余俊晖19 小时前
多模态文档解析新进展:多模态OCR解析文档中的任意内容实现方案
人工智能·自然语言处理·多模态
余俊晖19 小时前
多模态文档解析最新开源进展:2B参数FireRed-OCR模型方法、数据
人工智能·自然语言处理·ocr·多模态
十除以十等于一3 天前
EasyAnimateV5图生视频企业应用:房地产户型图→沉浸式VR看房视频生成
vr看房·图生视频·easyanimate·星图gpu
_张一凡3 天前
【大语言模型学习】一文详解阿里Qwen3大模型以及全参量微调入门实战教程(代码完整)
llm·aigc·大语言模型·多模态·qwen3·大语言模型微调·全参量微调
大傻^3 天前
Spring AI 2.0 企业级 RAG 架构:混合检索、重排序与多模态知识库
人工智能·spring·架构·多模态·rag·混合检索·重排序
人工智能培训咨询叶梓3 天前
SYNCHECK:提升检索增强型语言模型的可信度
人工智能·深度学习·语言模型·大模型·检索增强·多模态·rag
xx_xxxxx_4 天前
多模态动态融合模型Predictive Dynamic Fusion论文阅读与代码分析4-代码架构
论文阅读·机器学习·transformer·多模态
xx_xxxxx_6 天前
多模态动态融合模型Predictive Dynamic Fusion论文阅读与代码分析3-部分数学理论基础
论文阅读·机器学习·transformer·多模态
最初的↘那颗心6 天前
Spring AI Alibaba 多模态全家桶:图片理解、图片生成与语音合成实战
spring boot·大模型·多模态·通义千问·spring ai