StreamMultiDiffusion:可实现用户指定的区域文本提示来生成实时、交互式、多文本到图像的功能

​StreamMultiDiffusion 是根据用户指定的区域文本提示生成实时 交互式 多文本到图像的功能。

该视频意味着该项目最终可以让你通过细粒度的区域提示控制来生成大尺寸图像。以前,这是根本不可行的。每次试验花费一个小时意味着你无法多次采样来选择你想要的最佳生成或调整生成过程以实现你的意图。然而,该项目已将延迟从一小时减少到一分钟,使该技术对创作者来说是可行的!

​ 相关链接

项目:github.com/ironjr/Stre... 论文:arxiv.org/abs/2403.09... 论文:StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control(具有基于区域的语义控制的实时交互生成)

论文摘要

扩散模型在文本到图像合成方面取得的巨大成功,使其成为下一代图像生成和编辑最终用户应用程序的有希望的候选者。 以前的工作重点是通过减少推理时间来提高扩散模型的可用性,或者通过允许新的细粒度控制(例如基于区域的文本提示)来增加用户交互性。然而,我们凭经验发现,整合两个工作分支并非易事,限制了扩散模型的潜力。 为了解决这种不兼容性,我们提出了 StreamMultiDiffusion,这是第一个基于区域的实时文本到图像生成框架。通过稳定快速推理技术并将模型重组为新提出的多提示流批处理架构,我们实现了× 10比现有解决方案更快的全景生成速度,以及在单个 RTX 2080 Ti GPU 上基于区域的文本到图像合成的生成速度为 1.57 FPS。 我们的解决方案开辟了一种名为语义调色板的交互式图像生成的新范例,其中从给定的多个手绘区域实时生成高质量图像,并编码规定的语义含义(例如,鹰、女孩)。

  • 交互式图像生成从头开始与细粒度区域控制。换句话说,你用意义来描绘图像。
  • 提示分离。当同时生成两个或更多对象不再被无意的内容混合所困扰!
  • 实时图像绘制和编辑。可以利用任何上传的照片或你想要的艺术品。

主要模块

  • model.StableMultiDiffusionPipeline: 用于单调用生成(可能更适合CLI用户)
  • model.StreamMultiDiffusion用于流媒体应用程序,例如这个README页面的主图。我们为下面可能的应用程序提供最小的示例。

流式生成处理

使用多提示流批处理,通过对原始流批处理架构的修改,可以流式传输这个多提示文本到图像的生成过程,以永远生成图像。

结果

代码

ini 复制代码
import torch
from util import seed_everything, Streamer
from model import StreamMultiDiffusion
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
import time
import imageio # This is not included in our requirements.txt!
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
height = 768
width = 512
 
# Load the module.
device = torch.device(f'cuda:{device}')
smd = StreamMultiDiffusion(
    device,
    hf_key='ironjr/BlazingDriveV11m',
    sd_version='1.5',
    height=height,
    width=width,
    cfg_type='none',
    autoflush=True,
    use_tiny_vae=True,
    mask_type='continuous',
    bootstrap_steps=2,
    bootstrap_mix_steps=1.5,
    seed=seed,
)
 
# Load the masks.
masks = []
for i in range(1, 3):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/zeus/prompt_p{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
 
# Register a background, prompts, and masks (this can be called multiple times).
smd.update_background(Image.new(size=(width, height), mode='RGB', color=(255, 255, 255)))
smd.update_single_layer(
    idx=0,
    prompt='a photo of Mount Olympus',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=background,
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
smd.update_single_layer(
    idx=1,
    prompt='1girl, looking at viewer, lifts arm, smile, happy, Greek goddess Athena',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=masks[0],
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
smd.update_single_layer(
    idx=2,
    prompt='a small, sitting owl',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=masks[1],
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
 
 
# Generate images... forever.
# while True:
#     image = smd()
#     image.save(f'{str(int(time.time() % 100000))}.png') # This will take up your hard drive pretty much soon.
#     display(image) # If `from IPython.display import display` is called.
#
#     You can also intercept the process in the middle of the generation by updating other background, prompts or masks.
#     smd.update_single_layer(
#         idx=2,
#         prompt='a small, sitting owl',
#         negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
#         mask=masks[1],
#         mask_strength=1.0,
#         mask_std=0.0,
#         prompt_strength=1.0,
#     )
 
# Or make a video/gif from your generation stream (requires `imageio`)
frames = []
for _ in range(50):
    image = smd()
    frames.append(image)
imageio.mimsave('my_beautiful_creation.gif', frames, loop=0) 

基于区域的多文本到图像生成

支持从任意数量的提示掩码对生成任意大小的图像。

结果

代码

ini 复制代码
 import torch
from model import StableMultiDiffusionPipeline
from util import seed_everything
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
 
# Load the module.
seed_everything(seed)
device = torch.device(f'cuda:{device}')
smd = StableMultiDiffusionPipeline(
    device,
    hf_key='ironjr/BlazingDriveV11m',
    sd_version='1.5',
)
 
# Load prompts.
prompts = [
    # Background prompt.
    '1girl, 1boy, times square',
    # Foreground prompts.
    '1boy, looking at viewer, brown hair, casual shirt',
    '1girl, looking at viewer, pink hair, leather jacket',
]
negative_prompts = [
    '',
    '1girl', # (Optional) The first prompt is a boy so we don't want a girl.
    '1boy', # (Optional) The first prompt is a girl so we don't want a boy.
]
negative_prompt_prefix = 'worst quality, bad quality, normal quality, cropped, framed'
negative_prompts = [negative_prompt_prefix + ', ' + p for p in negative_prompts]
 
# Load masks.
masks = []
for i in range(1, 3):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/timessquare/timessquare_{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
masks = torch.stack([background] + masks, dim=0).float()
 
height, width = masks.shape[-2:] # (768, 768) in this example.
 
# Sample an image.
image = smd(
    prompts,
    negative_prompts,
    masks=masks,
    mask_strengths=1,
    mask_stds=0,
    height=height,
    width=width,
    bootstrap_steps=2,
)
image.save('my_beautiful_creation.png')
 

基于更大区域的多文本到图像生成

下面的代码使用50步DDIM采样器的原始MultiDiffusion管道大约需要一个小时来运行代码,但我们已经将其减少到一分钟。

结果

代码

ini 复制代码
import torch
from model import StableMultiDiffusionPipeline
from util import seed_everything
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
 
# Load the module.
seed_everything(seed)
device = torch.device(f'cuda:{device}')
smd = StableMultiDiffusionPipeline(device)
 
# Load prompts.
prompts = [
    # Background prompt.
    'clear deep blue sky',
    # Foreground prompts.
    'summer mountains',
    'the sun',
    'the moon',
    'a giant waterfall',
    'a giant waterfall',
    'clean deep blue lake',
    'a large tree',
    'a large tree',
]
negative_prompts = ['worst quality, bad quality, normal quality, cropped, framed'] * len(prompts)
 
# Load masks.
masks = []
for i in range(1, 9):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/irworobongdo/irworobongdo_{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
masks = torch.stack([background] + masks, dim=0).float()
 
height, width = masks.shape[-2:] # (768, 1920) in this example.
 
# Sample an image.
image = smd(
    prompts,
    negative_prompts,
    masks=masks,
    mask_strengths=1,
    mask_stds=0,
    height=height,
    width=width,
    bootstrap_steps=2,
)
image.save('my_beautiful_creation.png')
 

相关推荐
Lenyiin7 分钟前
01.02、判定是否互为字符重排
算法·leetcode
鸽鸽程序猿22 分钟前
【算法】【优选算法】宽搜(BFS)中队列的使用
算法·宽度优先·队列
Jackey_Song_Odd22 分钟前
C语言 单向链表反转问题
c语言·数据结构·算法·链表
Watermelo61726 分钟前
详解js柯里化原理及用法,探究柯里化在Redux Selector 的场景模拟、构建复杂的数据流管道、优化深度嵌套函数中的精妙应用
开发语言·前端·javascript·算法·数据挖掘·数据分析·ecmascript
乐之者v31 分钟前
leetCode43.字符串相乘
java·数据结构·算法
A懿轩A2 小时前
C/C++ 数据结构与算法【数组】 数组详细解析【日常学习,考研必备】带图+详细代码
c语言·数据结构·c++·学习·考研·算法·数组
古希腊掌管学习的神2 小时前
[搜广推]王树森推荐系统——矩阵补充&最近邻查找
python·算法·机器学习·矩阵
云边有个稻草人2 小时前
【优选算法】—复写零(双指针算法)
笔记·算法·双指针算法
半盏茶香2 小时前
在21世纪的我用C语言探寻世界本质 ——编译和链接(编译环境和运行环境)
c语言·开发语言·c++·算法
忘梓.2 小时前
解锁动态规划的奥秘:从零到精通的创新思维解析(3)
算法·动态规划