区分stable diffusion中的通道数与张量维度

区分stable diffusion中的通道数与张量形状

  • 1.通道数:
    • [1.1 channel = 3](#1.1 channel = 3)
    • [1.2 channel = 4](#1.2 channel = 4)
  • 2.张量形状
    • [2.1 3D 张量](#2.1 3D 张量)
    • [2.2 4D 张量](#2.2 4D 张量)
      • [2.2.1 通常](#2.2.1 通常)
      • [2.2.2 stable diffusion](#2.2.2 stable diffusion)
  • 3.应用
    • [3.1 问题](#3.1 问题)
    • [3.2 举例](#3.2 举例)

前言:通道数与张量形状都在数值3和4之间变换,容易混淆。

1.通道数:

1.1 channel = 3

RGB 图像具有 3 个通道(红色、绿色和蓝色)。

1.2 channel = 4

Stable Diffusion has 4 latent channels。
如何理解卷积神经网络中的通道(channel)

2.张量形状

2.1 3D 张量

形状为 (C, H, W),其中 C 是通道数,H 是高度,W 是宽度。这适用于单个图像。

2.2 4D 张量

2.2.1 通常

形状为 (B, C, H, W),其中 B 是批次大小,C 是通道数,H 是高度,W 是宽度。这适用于多个图像(例如,批量处理)。

2.2.2 stable diffusion

在img2img中,将image用vae编码并按照timestep加噪:

python 复制代码
		# This code copyed from diffusers.pipline_controlnet_img2img.py
        # 6. Prepare latent variables
        latents = self.prepare_latents(
            image,
            latent_timestep,
            batch_size,
            num_images_per_prompt,
            prompt_embeds.dtype,
            device,
            generator,
        )

image的dim(维度)是3,而latents的dim为4。

让我们先看text2img的prepare_latents函数:

python 复制代码
    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
    def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
        shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
        if isinstance(generator, list) and len(generator) != batch_size:
            raise ValueError(
                f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
                f" size of {batch_size}. Make sure the batch size matches the length of the generators."
            )

        if latents is None:
            latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
        else:
            latents = latents.to(device)

        # scale the initial noise by the standard deviation required by the scheduler
        latents = latents * self.scheduler.init_noise_sigma
        return latents

显然,shape已经规定了latents的dim(4)和排列顺序。

在img2img中:

python 复制代码
    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.prepare_latents
    def prepare_latents(self, image, timestep, batch_size, num_images_per_prompt, dtype, device, generator=None):
        if not isinstance(image, (torch.Tensor, PIL.Image.Image, list)):
            raise ValueError(
                f"`image` has to be of type `torch.Tensor`, `PIL.Image.Image` or list but is {type(image)}"
            )

        image = image.to(device=device, dtype=dtype)

        batch_size = batch_size * num_images_per_prompt

        if image.shape[1] == 4:
            init_latents = image

        else:
            if isinstance(generator, list) and len(generator) != batch_size:
                raise ValueError(
                    f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
                    f" size of {batch_size}. Make sure the batch size matches the length of the generators."
                )

            elif isinstance(generator, list):
                init_latents = [
                    self.vae.encode(image[i : i + 1]).latent_dist.sample(generator[i]) for i in range(batch_size)
                ]
                init_latents = torch.cat(init_latents, dim=0)
            else:
                init_latents = self.vae.encode(image).latent_dist.sample(generator)

            init_latents = self.vae.config.scaling_factor * init_latents

        if batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] == 0:
            # expand init_latents for batch_size
            deprecation_message = (
                f"You have passed {batch_size} text prompts (`prompt`), but only {init_latents.shape[0]} initial"
                " images (`image`). Initial images are now duplicating to match the number of text prompts. Note"
                " that this behavior is deprecated and will be removed in a version 1.0.0. Please make sure to update"
                " your script to pass as many initial images as text prompts to suppress this warning."
            )
            deprecate("len(prompt) != len(image)", "1.0.0", deprecation_message, standard_warn=False)
            additional_image_per_prompt = batch_size // init_latents.shape[0]
            
            init_latents = torch.cat([init_latents] * additional_image_per_prompt, dim=0)
        elif batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] != 0:
            raise ValueError(
                f"Cannot duplicate `image` of batch size {init_latents.shape[0]} to {batch_size} text prompts."
            )
        else:
            init_latents = torch.cat([init_latents], dim=0)

        shape = init_latents.shape
        noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)

        # get latents
        init_latents = self.scheduler.add_noise(init_latents, noise, timestep)
        latents = init_latents

        return latents

3.应用

3.1 问题

python 复制代码
new_map = texture.permute(1, 2, 0)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 4 is not equal to len(dims) = 3

该问题是张量形状的问题,跟通道数毫无关系。

3.2 举例

问:4D 张量:形状为 (B, C, H, W),其中C可以为3吗?

答:4D 张量的形状为 (B,C,H,W),其中 C 表示通道数。通常情况下,C 可以为 3,这对应于 RGB 图像的三个颜色通道(红色、绿色和蓝色)。

相关推荐
AI绘画君11 分钟前
Stable Diffusion绘画 | AI 图片智能扩充,超越PS扩图的AI扩图功能(附安装包)
人工智能·ai作画·stable diffusion·aigc·ai绘画·ai扩图
乔代码嘚3 小时前
AI2.0时代,普通小白如何通过AI月入30万
人工智能·stable diffusion·aigc
肖遥Janic19 小时前
Stable Diffusion绘画 | 插件-Deforum:动态视频生成(上篇)
人工智能·ai·ai作画·stable diffusion
肖遥Janic1 天前
Stable Diffusion绘画 | 插件-Deforum:商业LOGO广告视频
人工智能·ai·ai作画·stable diffusion
sleetdream2 天前
Pycharm 本地搭建 stable-diffusion-webui
stable diffusion
chenkangck503 天前
AI大模型之旅-最强开源文生图工具Stable Diffusion WebUI 教程
人工智能·stable diffusion
小龙在山东3 天前
基于Flux的文生高清图片
stable diffusion·flux
肖遥Janic3 天前
Stable Diffusion绘画 | 来训练属于自己的模型:炼丹参数调整--步数设置与计算
人工智能·ai·ai作画·stable diffusion
许野平4 天前
Stable Diffusion 蒙版:填充、原图、潜空间噪声(潜变量噪声)、潜空间数值零(潜变量数值零)
人工智能·计算机视觉·stable diffusion
肖遥Janic4 天前
Stable Diffusion绘画 | 插件-Addition Networks:单独控制LoRA
人工智能·ai·ai作画·stable diffusion