StreamMultiDiffusion:可实现用户指定的区域文本提示来生成实时、交互式、多文本到图像的功能

​StreamMultiDiffusion 是根据用户指定的区域文本提示生成实时 交互式 多文本到图像的功能。

该视频意味着该项目最终可以让你通过细粒度的区域提示控制来生成大尺寸图像。以前,这是根本不可行的。每次试验花费一个小时意味着你无法多次采样来选择你想要的最佳生成或调整生成过程以实现你的意图。然而,该项目已将延迟从一小时减少到一分钟,使该技术对创作者来说是可行的!

​ 相关链接

项目:github.com/ironjr/Stre... 论文:arxiv.org/abs/2403.09... 论文:StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control(具有基于区域的语义控制的实时交互生成)

论文摘要

扩散模型在文本到图像合成方面取得的巨大成功,使其成为下一代图像生成和编辑最终用户应用程序的有希望的候选者。 以前的工作重点是通过减少推理时间来提高扩散模型的可用性,或者通过允许新的细粒度控制(例如基于区域的文本提示)来增加用户交互性。然而,我们凭经验发现,整合两个工作分支并非易事,限制了扩散模型的潜力。 为了解决这种不兼容性,我们提出了 StreamMultiDiffusion,这是第一个基于区域的实时文本到图像生成框架。通过稳定快速推理技术并将模型重组为新提出的多提示流批处理架构,我们实现了× 10比现有解决方案更快的全景生成速度,以及在单个 RTX 2080 Ti GPU 上基于区域的文本到图像合成的生成速度为 1.57 FPS。 我们的解决方案开辟了一种名为语义调色板的交互式图像生成的新范例,其中从给定的多个手绘区域实时生成高质量图像,并编码规定的语义含义(例如,鹰、女孩)。

  • 交互式图像生成从头开始与细粒度区域控制。换句话说,你用意义来描绘图像。
  • 提示分离。当同时生成两个或更多对象不再被无意的内容混合所困扰!
  • 实时图像绘制和编辑。可以利用任何上传的照片或你想要的艺术品。

主要模块

  • model.StableMultiDiffusionPipeline: 用于单调用生成(可能更适合CLI用户)
  • model.StreamMultiDiffusion用于流媒体应用程序,例如这个README页面的主图。我们为下面可能的应用程序提供最小的示例。

流式生成处理

使用多提示流批处理,通过对原始流批处理架构的修改,可以流式传输这个多提示文本到图像的生成过程,以永远生成图像。

结果

代码

ini 复制代码
import torch
from util import seed_everything, Streamer
from model import StreamMultiDiffusion
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
import time
import imageio # This is not included in our requirements.txt!
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
height = 768
width = 512
 
# Load the module.
device = torch.device(f'cuda:{device}')
smd = StreamMultiDiffusion(
    device,
    hf_key='ironjr/BlazingDriveV11m',
    sd_version='1.5',
    height=height,
    width=width,
    cfg_type='none',
    autoflush=True,
    use_tiny_vae=True,
    mask_type='continuous',
    bootstrap_steps=2,
    bootstrap_mix_steps=1.5,
    seed=seed,
)
 
# Load the masks.
masks = []
for i in range(1, 3):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/zeus/prompt_p{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
 
# Register a background, prompts, and masks (this can be called multiple times).
smd.update_background(Image.new(size=(width, height), mode='RGB', color=(255, 255, 255)))
smd.update_single_layer(
    idx=0,
    prompt='a photo of Mount Olympus',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=background,
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
smd.update_single_layer(
    idx=1,
    prompt='1girl, looking at viewer, lifts arm, smile, happy, Greek goddess Athena',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=masks[0],
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
smd.update_single_layer(
    idx=2,
    prompt='a small, sitting owl',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=masks[1],
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
 
 
# Generate images... forever.
# while True:
#     image = smd()
#     image.save(f'{str(int(time.time() % 100000))}.png') # This will take up your hard drive pretty much soon.
#     display(image) # If `from IPython.display import display` is called.
#
#     You can also intercept the process in the middle of the generation by updating other background, prompts or masks.
#     smd.update_single_layer(
#         idx=2,
#         prompt='a small, sitting owl',
#         negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
#         mask=masks[1],
#         mask_strength=1.0,
#         mask_std=0.0,
#         prompt_strength=1.0,
#     )
 
# Or make a video/gif from your generation stream (requires `imageio`)
frames = []
for _ in range(50):
    image = smd()
    frames.append(image)
imageio.mimsave('my_beautiful_creation.gif', frames, loop=0) 

基于区域的多文本到图像生成

支持从任意数量的提示掩码对生成任意大小的图像。

结果

代码

ini 复制代码
 import torch
from model import StableMultiDiffusionPipeline
from util import seed_everything
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
 
# Load the module.
seed_everything(seed)
device = torch.device(f'cuda:{device}')
smd = StableMultiDiffusionPipeline(
    device,
    hf_key='ironjr/BlazingDriveV11m',
    sd_version='1.5',
)
 
# Load prompts.
prompts = [
    # Background prompt.
    '1girl, 1boy, times square',
    # Foreground prompts.
    '1boy, looking at viewer, brown hair, casual shirt',
    '1girl, looking at viewer, pink hair, leather jacket',
]
negative_prompts = [
    '',
    '1girl', # (Optional) The first prompt is a boy so we don't want a girl.
    '1boy', # (Optional) The first prompt is a girl so we don't want a boy.
]
negative_prompt_prefix = 'worst quality, bad quality, normal quality, cropped, framed'
negative_prompts = [negative_prompt_prefix + ', ' + p for p in negative_prompts]
 
# Load masks.
masks = []
for i in range(1, 3):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/timessquare/timessquare_{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
masks = torch.stack([background] + masks, dim=0).float()
 
height, width = masks.shape[-2:] # (768, 768) in this example.
 
# Sample an image.
image = smd(
    prompts,
    negative_prompts,
    masks=masks,
    mask_strengths=1,
    mask_stds=0,
    height=height,
    width=width,
    bootstrap_steps=2,
)
image.save('my_beautiful_creation.png')
 

基于更大区域的多文本到图像生成

下面的代码使用50步DDIM采样器的原始MultiDiffusion管道大约需要一个小时来运行代码,但我们已经将其减少到一分钟。

结果

代码

ini 复制代码
import torch
from model import StableMultiDiffusionPipeline
from util import seed_everything
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
 
# Load the module.
seed_everything(seed)
device = torch.device(f'cuda:{device}')
smd = StableMultiDiffusionPipeline(device)
 
# Load prompts.
prompts = [
    # Background prompt.
    'clear deep blue sky',
    # Foreground prompts.
    'summer mountains',
    'the sun',
    'the moon',
    'a giant waterfall',
    'a giant waterfall',
    'clean deep blue lake',
    'a large tree',
    'a large tree',
]
negative_prompts = ['worst quality, bad quality, normal quality, cropped, framed'] * len(prompts)
 
# Load masks.
masks = []
for i in range(1, 9):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/irworobongdo/irworobongdo_{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
masks = torch.stack([background] + masks, dim=0).float()
 
height, width = masks.shape[-2:] # (768, 1920) in this example.
 
# Sample an image.
image = smd(
    prompts,
    negative_prompts,
    masks=masks,
    mask_strengths=1,
    mask_stds=0,
    height=height,
    width=width,
    bootstrap_steps=2,
)
image.save('my_beautiful_creation.png')
 

相关推荐
好奇龙猫1 小时前
【学习AI-相关路程-mnist手写数字分类-win-硬件:windows-自我学习AI-实验步骤-全连接神经网络(BPnetwork)-操作流程(3) 】
人工智能·算法
sp_fyf_20242 小时前
计算机前沿技术-人工智能算法-大语言模型-最新研究进展-2024-11-01
人工智能·深度学习·神经网络·算法·机器学习·语言模型·数据挖掘
香菜大丸2 小时前
链表的归并排序
数据结构·算法·链表
jrrz08282 小时前
LeetCode 热题100(七)【链表】(1)
数据结构·c++·算法·leetcode·链表
oliveira-time2 小时前
golang学习2
算法
南宫生3 小时前
贪心算法习题其四【力扣】【算法学习day.21】
学习·算法·leetcode·链表·贪心算法
懒惰才能让科技进步4 小时前
从零学习大模型(十二)-----基于梯度的重要性剪枝(Gradient-based Pruning)
人工智能·深度学习·学习·算法·chatgpt·transformer·剪枝
Ni-Guvara4 小时前
函数对象笔记
c++·算法
泉崎4 小时前
11.7比赛总结
数据结构·算法
你好helloworld4 小时前
滑动窗口最大值
数据结构·算法·leetcode