StreamMultiDiffusion:可实现用户指定的区域文本提示来生成实时、交互式、多文本到图像的功能

​StreamMultiDiffusion 是根据用户指定的区域文本提示生成实时 交互式 多文本到图像的功能。

该视频意味着该项目最终可以让你通过细粒度的区域提示控制来生成大尺寸图像。以前,这是根本不可行的。每次试验花费一个小时意味着你无法多次采样来选择你想要的最佳生成或调整生成过程以实现你的意图。然而,该项目已将延迟从一小时减少到一分钟,使该技术对创作者来说是可行的!

​ 相关链接

项目:github.com/ironjr/Stre... 论文:arxiv.org/abs/2403.09... 论文:StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control(具有基于区域的语义控制的实时交互生成)

论文摘要

扩散模型在文本到图像合成方面取得的巨大成功,使其成为下一代图像生成和编辑最终用户应用程序的有希望的候选者。 以前的工作重点是通过减少推理时间来提高扩散模型的可用性,或者通过允许新的细粒度控制(例如基于区域的文本提示)来增加用户交互性。然而,我们凭经验发现,整合两个工作分支并非易事,限制了扩散模型的潜力。 为了解决这种不兼容性,我们提出了 StreamMultiDiffusion,这是第一个基于区域的实时文本到图像生成框架。通过稳定快速推理技术并将模型重组为新提出的多提示流批处理架构,我们实现了× 10比现有解决方案更快的全景生成速度,以及在单个 RTX 2080 Ti GPU 上基于区域的文本到图像合成的生成速度为 1.57 FPS。 我们的解决方案开辟了一种名为语义调色板的交互式图像生成的新范例,其中从给定的多个手绘区域实时生成高质量图像,并编码规定的语义含义(例如,鹰、女孩)。

  • 交互式图像生成从头开始与细粒度区域控制。换句话说,你用意义来描绘图像。
  • 提示分离。当同时生成两个或更多对象不再被无意的内容混合所困扰!
  • 实时图像绘制和编辑。可以利用任何上传的照片或你想要的艺术品。

主要模块

  • model.StableMultiDiffusionPipeline: 用于单调用生成(可能更适合CLI用户)
  • model.StreamMultiDiffusion用于流媒体应用程序,例如这个README页面的主图。我们为下面可能的应用程序提供最小的示例。

流式生成处理

使用多提示流批处理,通过对原始流批处理架构的修改,可以流式传输这个多提示文本到图像的生成过程,以永远生成图像。

结果

代码

ini 复制代码
import torch
from util import seed_everything, Streamer
from model import StreamMultiDiffusion
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
import time
import imageio # This is not included in our requirements.txt!
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
height = 768
width = 512
 
# Load the module.
device = torch.device(f'cuda:{device}')
smd = StreamMultiDiffusion(
    device,
    hf_key='ironjr/BlazingDriveV11m',
    sd_version='1.5',
    height=height,
    width=width,
    cfg_type='none',
    autoflush=True,
    use_tiny_vae=True,
    mask_type='continuous',
    bootstrap_steps=2,
    bootstrap_mix_steps=1.5,
    seed=seed,
)
 
# Load the masks.
masks = []
for i in range(1, 3):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/zeus/prompt_p{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
 
# Register a background, prompts, and masks (this can be called multiple times).
smd.update_background(Image.new(size=(width, height), mode='RGB', color=(255, 255, 255)))
smd.update_single_layer(
    idx=0,
    prompt='a photo of Mount Olympus',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=background,
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
smd.update_single_layer(
    idx=1,
    prompt='1girl, looking at viewer, lifts arm, smile, happy, Greek goddess Athena',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=masks[0],
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
smd.update_single_layer(
    idx=2,
    prompt='a small, sitting owl',
    negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
    mask=masks[1],
    mask_strength=1.0,
    mask_std=0.0,
    prompt_strength=1.0,
)
 
 
# Generate images... forever.
# while True:
#     image = smd()
#     image.save(f'{str(int(time.time() % 100000))}.png') # This will take up your hard drive pretty much soon.
#     display(image) # If `from IPython.display import display` is called.
#
#     You can also intercept the process in the middle of the generation by updating other background, prompts or masks.
#     smd.update_single_layer(
#         idx=2,
#         prompt='a small, sitting owl',
#         negative_prompt='worst quality, bad quality, normal quality, cropped, framed',
#         mask=masks[1],
#         mask_strength=1.0,
#         mask_std=0.0,
#         prompt_strength=1.0,
#     )
 
# Or make a video/gif from your generation stream (requires `imageio`)
frames = []
for _ in range(50):
    image = smd()
    frames.append(image)
imageio.mimsave('my_beautiful_creation.gif', frames, loop=0) 

基于区域的多文本到图像生成

支持从任意数量的提示掩码对生成任意大小的图像。

结果

代码

ini 复制代码
 import torch
from model import StableMultiDiffusionPipeline
from util import seed_everything
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
 
# Load the module.
seed_everything(seed)
device = torch.device(f'cuda:{device}')
smd = StableMultiDiffusionPipeline(
    device,
    hf_key='ironjr/BlazingDriveV11m',
    sd_version='1.5',
)
 
# Load prompts.
prompts = [
    # Background prompt.
    '1girl, 1boy, times square',
    # Foreground prompts.
    '1boy, looking at viewer, brown hair, casual shirt',
    '1girl, looking at viewer, pink hair, leather jacket',
]
negative_prompts = [
    '',
    '1girl', # (Optional) The first prompt is a boy so we don't want a girl.
    '1boy', # (Optional) The first prompt is a girl so we don't want a boy.
]
negative_prompt_prefix = 'worst quality, bad quality, normal quality, cropped, framed'
negative_prompts = [negative_prompt_prefix + ', ' + p for p in negative_prompts]
 
# Load masks.
masks = []
for i in range(1, 3):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/timessquare/timessquare_{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
masks = torch.stack([background] + masks, dim=0).float()
 
height, width = masks.shape[-2:] # (768, 768) in this example.
 
# Sample an image.
image = smd(
    prompts,
    negative_prompts,
    masks=masks,
    mask_strengths=1,
    mask_stds=0,
    height=height,
    width=width,
    bootstrap_steps=2,
)
image.save('my_beautiful_creation.png')
 

基于更大区域的多文本到图像生成

下面的代码使用50步DDIM采样器的原始MultiDiffusion管道大约需要一个小时来运行代码,但我们已经将其减少到一分钟。

结果

代码

ini 复制代码
import torch
from model import StableMultiDiffusionPipeline
from util import seed_everything
 
# The following packages are imported only for loading the images.
import torchvision.transforms as T
import requests
from functools import reduce
from io import BytesIO
from PIL import Image
 
 
seed = 2024
device = 0
 
# Load the module.
seed_everything(seed)
device = torch.device(f'cuda:{device}')
smd = StableMultiDiffusionPipeline(device)
 
# Load prompts.
prompts = [
    # Background prompt.
    'clear deep blue sky',
    # Foreground prompts.
    'summer mountains',
    'the sun',
    'the moon',
    'a giant waterfall',
    'a giant waterfall',
    'clean deep blue lake',
    'a large tree',
    'a large tree',
]
negative_prompts = ['worst quality, bad quality, normal quality, cropped, framed'] * len(prompts)
 
# Load masks.
masks = []
for i in range(1, 9):
    url = f'https://raw.githubusercontent.com/ironjr/StreamMultiDiffusion/main/assets/irworobongdo/irworobongdo_{i}.png'
    response = requests.get(url)
    mask = Image.open(BytesIO(response.content)).convert('RGBA')
    mask = (T.ToTensor()(mask)[-1:] > 0.5).float()
    masks.append(mask)
# In this example, background is simply set as non-marked regions.
background = reduce(torch.logical_and, [m == 0 for m in masks])
masks = torch.stack([background] + masks, dim=0).float()
 
height, width = masks.shape[-2:] # (768, 1920) in this example.
 
# Sample an image.
image = smd(
    prompts,
    negative_prompts,
    masks=masks,
    mask_strengths=1,
    mask_stds=0,
    height=height,
    width=width,
    bootstrap_steps=2,
)
image.save('my_beautiful_creation.png')
 

相关推荐
OYYHXPJR7 分钟前
算法重新刷题
数据结构·算法
passion更好26 分钟前
【matlab】【python】爬虫实战
爬虫·算法·matlab
依晴无旧2 小时前
数组算法(二):交替子数组计数
数据结构·算法
ddddd码2 小时前
力扣-贪心算法4
算法·leetcode·贪心算法
ZShiJ2 小时前
【题解】—— LeetCode一周小结27
算法·leetcode·职场和发展
振华首席娱记2 小时前
代码随想录——划分字母区间(Leetcode763)
java·数据结构·算法·leetcode·职场和发展
阳光男孩012 小时前
力扣3026.最大好子数组和
数据结构·算法·leetcode
Eliauk &2 小时前
【机器学习】分类算法-KNN算法实现
人工智能·python·算法·机器学习·分类
littlesujin3 小时前
昇思25天打卡营-mindspore-ML- Day15-K近邻算法实现红酒分类
算法·分类·近邻算法
Dalao_zzl3 小时前
C语言实现的冒泡排序算法的示例程序
c语言·算法·排序算法