区分stable diffusion中的通道数与张量维度

区分stable diffusion中的通道数与张量形状

  • 1.通道数:
    • [1.1 channel = 3](#1.1 channel = 3)
    • [1.2 channel = 4](#1.2 channel = 4)
  • 2.张量形状
    • [2.1 3D 张量](#2.1 3D 张量)
    • [2.2 4D 张量](#2.2 4D 张量)
      • [2.2.1 通常](#2.2.1 通常)
      • [2.2.2 stable diffusion](#2.2.2 stable diffusion)
  • 3.应用
    • [3.1 问题](#3.1 问题)
    • [3.2 举例](#3.2 举例)

前言:通道数与张量形状都在数值3和4之间变换,容易混淆。

1.通道数:

1.1 channel = 3

RGB 图像具有 3 个通道(红色、绿色和蓝色)。

1.2 channel = 4

Stable Diffusion has 4 latent channels。
如何理解卷积神经网络中的通道(channel)

2.张量形状

2.1 3D 张量

形状为 (C, H, W),其中 C 是通道数,H 是高度,W 是宽度。这适用于单个图像。

2.2 4D 张量

2.2.1 通常

形状为 (B, C, H, W),其中 B 是批次大小,C 是通道数,H 是高度,W 是宽度。这适用于多个图像(例如,批量处理)。

2.2.2 stable diffusion

在img2img中,将image用vae编码并按照timestep加噪:

python 复制代码
		# This code copyed from diffusers.pipline_controlnet_img2img.py
        # 6. Prepare latent variables
        latents = self.prepare_latents(
            image,
            latent_timestep,
            batch_size,
            num_images_per_prompt,
            prompt_embeds.dtype,
            device,
            generator,
        )

image的dim(维度)是3,而latents的dim为4。

让我们先看text2img的prepare_latents函数:

python 复制代码
    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
    def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
        shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
        if isinstance(generator, list) and len(generator) != batch_size:
            raise ValueError(
                f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
                f" size of {batch_size}. Make sure the batch size matches the length of the generators."
            )

        if latents is None:
            latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
        else:
            latents = latents.to(device)

        # scale the initial noise by the standard deviation required by the scheduler
        latents = latents * self.scheduler.init_noise_sigma
        return latents

显然,shape已经规定了latents的dim(4)和排列顺序。

在img2img中:

python 复制代码
    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.prepare_latents
    def prepare_latents(self, image, timestep, batch_size, num_images_per_prompt, dtype, device, generator=None):
        if not isinstance(image, (torch.Tensor, PIL.Image.Image, list)):
            raise ValueError(
                f"`image` has to be of type `torch.Tensor`, `PIL.Image.Image` or list but is {type(image)}"
            )

        image = image.to(device=device, dtype=dtype)

        batch_size = batch_size * num_images_per_prompt

        if image.shape[1] == 4:
            init_latents = image

        else:
            if isinstance(generator, list) and len(generator) != batch_size:
                raise ValueError(
                    f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
                    f" size of {batch_size}. Make sure the batch size matches the length of the generators."
                )

            elif isinstance(generator, list):
                init_latents = [
                    self.vae.encode(image[i : i + 1]).latent_dist.sample(generator[i]) for i in range(batch_size)
                ]
                init_latents = torch.cat(init_latents, dim=0)
            else:
                init_latents = self.vae.encode(image).latent_dist.sample(generator)

            init_latents = self.vae.config.scaling_factor * init_latents

        if batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] == 0:
            # expand init_latents for batch_size
            deprecation_message = (
                f"You have passed {batch_size} text prompts (`prompt`), but only {init_latents.shape[0]} initial"
                " images (`image`). Initial images are now duplicating to match the number of text prompts. Note"
                " that this behavior is deprecated and will be removed in a version 1.0.0. Please make sure to update"
                " your script to pass as many initial images as text prompts to suppress this warning."
            )
            deprecate("len(prompt) != len(image)", "1.0.0", deprecation_message, standard_warn=False)
            additional_image_per_prompt = batch_size // init_latents.shape[0]
            
            init_latents = torch.cat([init_latents] * additional_image_per_prompt, dim=0)
        elif batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] != 0:
            raise ValueError(
                f"Cannot duplicate `image` of batch size {init_latents.shape[0]} to {batch_size} text prompts."
            )
        else:
            init_latents = torch.cat([init_latents], dim=0)

        shape = init_latents.shape
        noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)

        # get latents
        init_latents = self.scheduler.add_noise(init_latents, noise, timestep)
        latents = init_latents

        return latents

3.应用

3.1 问题

python 复制代码
new_map = texture.permute(1, 2, 0)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 4 is not equal to len(dims) = 3

该问题是张量形状的问题,跟通道数毫无关系。

3.2 举例

问:4D 张量:形状为 (B, C, H, W),其中C可以为3吗?

答:4D 张量的形状为 (B,C,H,W),其中 C 表示通道数。通常情况下,C 可以为 3,这对应于 RGB 图像的三个颜色通道(红色、绿色和蓝色)。

相关推荐
wei_shuo2 天前
GpuGeek 实操指南:So-VITS-SVC 语音合成与 Stable Diffusion 文生图双模型搭建,融合即梦 AI 的深度实践
人工智能·stable diffusion·gpu算力·gpuseek
这是一个懒人4 天前
Stable Diffusion WebUI 插件大全:功能详解与下载地址
stable diffusion
浪淘沙jkp4 天前
AI大模型学习十八、利用Dify+deepseekR1 +本地部署Stable Diffusion搭建 AI 图片生成应用
人工智能·stable diffusion·agent·dify·ollama·deepseek
Icoolkj5 天前
深入了解 Stable Diffusion:AI 图像生成的奥秘
人工智能·stable diffusion
这是一个懒人5 天前
mac 快速安装stable diffusion webui
macos·stable diffusion
璇转的鱼6 天前
Stable Diffusion进阶之Controlnet插件使用
人工智能·ai作画·stable diffusion·aigc·ai绘画
AloneCat20127 天前
stable Diffusion模型结构
stable diffusion
西西弗Sisyphus7 天前
Stable Diffusion XL 文生图
stable diffusion
霍志杰8 天前
stable-diffusion windows本地部署
windows·stable diffusion
昨日之日20068 天前
ACE-Step - 20秒生成4分钟完整歌曲,音乐界的Stable Diffusion,支持50系显卡 本地一键整合包下载
计算机视觉·stable diffusion·音视频