区分stable diffusion中的通道数与张量维度

区分stable diffusion中的通道数与张量形状

  • 1.通道数:
    • [1.1 channel = 3](#1.1 channel = 3)
    • [1.2 channel = 4](#1.2 channel = 4)
  • 2.张量形状
    • [2.1 3D 张量](#2.1 3D 张量)
    • [2.2 4D 张量](#2.2 4D 张量)
      • [2.2.1 通常](#2.2.1 通常)
      • [2.2.2 stable diffusion](#2.2.2 stable diffusion)
  • 3.应用
    • [3.1 问题](#3.1 问题)
    • [3.2 举例](#3.2 举例)

前言:通道数与张量形状都在数值3和4之间变换,容易混淆。

1.通道数:

1.1 channel = 3

RGB 图像具有 3 个通道(红色、绿色和蓝色)。

1.2 channel = 4

Stable Diffusion has 4 latent channels。
如何理解卷积神经网络中的通道(channel)

2.张量形状

2.1 3D 张量

形状为 (C, H, W),其中 C 是通道数,H 是高度,W 是宽度。这适用于单个图像。

2.2 4D 张量

2.2.1 通常

形状为 (B, C, H, W),其中 B 是批次大小,C 是通道数,H 是高度,W 是宽度。这适用于多个图像(例如,批量处理)。

2.2.2 stable diffusion

在img2img中,将image用vae编码并按照timestep加噪:

python 复制代码
		# This code copyed from diffusers.pipline_controlnet_img2img.py
        # 6. Prepare latent variables
        latents = self.prepare_latents(
            image,
            latent_timestep,
            batch_size,
            num_images_per_prompt,
            prompt_embeds.dtype,
            device,
            generator,
        )

image的dim(维度)是3,而latents的dim为4。

让我们先看text2img的prepare_latents函数:

python 复制代码
    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
    def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
        shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
        if isinstance(generator, list) and len(generator) != batch_size:
            raise ValueError(
                f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
                f" size of {batch_size}. Make sure the batch size matches the length of the generators."
            )

        if latents is None:
            latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
        else:
            latents = latents.to(device)

        # scale the initial noise by the standard deviation required by the scheduler
        latents = latents * self.scheduler.init_noise_sigma
        return latents

显然,shape已经规定了latents的dim(4)和排列顺序。

在img2img中:

python 复制代码
    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.prepare_latents
    def prepare_latents(self, image, timestep, batch_size, num_images_per_prompt, dtype, device, generator=None):
        if not isinstance(image, (torch.Tensor, PIL.Image.Image, list)):
            raise ValueError(
                f"`image` has to be of type `torch.Tensor`, `PIL.Image.Image` or list but is {type(image)}"
            )

        image = image.to(device=device, dtype=dtype)

        batch_size = batch_size * num_images_per_prompt

        if image.shape[1] == 4:
            init_latents = image

        else:
            if isinstance(generator, list) and len(generator) != batch_size:
                raise ValueError(
                    f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
                    f" size of {batch_size}. Make sure the batch size matches the length of the generators."
                )

            elif isinstance(generator, list):
                init_latents = [
                    self.vae.encode(image[i : i + 1]).latent_dist.sample(generator[i]) for i in range(batch_size)
                ]
                init_latents = torch.cat(init_latents, dim=0)
            else:
                init_latents = self.vae.encode(image).latent_dist.sample(generator)

            init_latents = self.vae.config.scaling_factor * init_latents

        if batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] == 0:
            # expand init_latents for batch_size
            deprecation_message = (
                f"You have passed {batch_size} text prompts (`prompt`), but only {init_latents.shape[0]} initial"
                " images (`image`). Initial images are now duplicating to match the number of text prompts. Note"
                " that this behavior is deprecated and will be removed in a version 1.0.0. Please make sure to update"
                " your script to pass as many initial images as text prompts to suppress this warning."
            )
            deprecate("len(prompt) != len(image)", "1.0.0", deprecation_message, standard_warn=False)
            additional_image_per_prompt = batch_size // init_latents.shape[0]
            
            init_latents = torch.cat([init_latents] * additional_image_per_prompt, dim=0)
        elif batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] != 0:
            raise ValueError(
                f"Cannot duplicate `image` of batch size {init_latents.shape[0]} to {batch_size} text prompts."
            )
        else:
            init_latents = torch.cat([init_latents], dim=0)

        shape = init_latents.shape
        noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)

        # get latents
        init_latents = self.scheduler.add_noise(init_latents, noise, timestep)
        latents = init_latents

        return latents

3.应用

3.1 问题

python 复制代码
new_map = texture.permute(1, 2, 0)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 4 is not equal to len(dims) = 3

该问题是张量形状的问题,跟通道数毫无关系。

3.2 举例

问:4D 张量:形状为 (B, C, H, W),其中C可以为3吗?

答:4D 张量的形状为 (B,C,H,W),其中 C 表示通道数。通常情况下,C 可以为 3,这对应于 RGB 图像的三个颜色通道(红色、绿色和蓝色)。

相关推荐
迈火5 天前
Facerestore CF (Code Former):ComfyUI人脸修复的卓越解决方案
人工智能·gpt·计算机视觉·stable diffusion·aigc·语音识别·midjourney
重启编程之路6 天前
Stable Diffusion 参数记录
stable diffusion
孤狼warrior9 天前
图像生成 Stable Diffusion模型架构介绍及使用代码 附数据集批量获取
人工智能·python·深度学习·stable diffusion·cnn·transformer·stablediffusion
love530love11 天前
【避坑指南】提示词“闹鬼”?Stable Diffusion 自动注入神秘词汇 xiao yi xian 排查全记录
人工智能·windows·stable diffusion·model keyword
世界尽头与你11 天前
Stable Diffusion web UI 未授权访问漏洞
安全·网络安全·stable diffusion·渗透测试
love530love11 天前
【故障解析】Stable Diffusion WebUI 更换主题后启动报 JSONDecodeError?可能是“主题加载”惹的祸
人工智能·windows·stable diffusion·大模型·json·stablediffusion·gradio 主题
ai_xiaogui16 天前
Stable Diffusion Web UI 绘世版 v4.6.1 整合包:一键极速部署,深度解决 AI 绘画环境配置与 CUDA 依赖难题
人工智能·stable diffusion·环境零配置·高性能内核优化·全功能插件集成·极速部署体验
微学AI17 天前
金仓数据库的新格局:以多模融合开创文档数据库
人工智能·stable diffusion
我的golang之路果然有问题17 天前
开源绘画大模型简单了解
人工智能·ai作画·stable diffusion·人工智能作画
我的golang之路果然有问题17 天前
comfyUI中的动作提取分享
人工智能·stable diffusion·ai绘画·人工智能作画·comfy