区分stable diffusion中的通道数与张量维度

区分stable diffusion中的通道数与张量形状

  • 1.通道数:
    • [1.1 channel = 3](#1.1 channel = 3)
    • [1.2 channel = 4](#1.2 channel = 4)
  • 2.张量形状
    • [2.1 3D 张量](#2.1 3D 张量)
    • [2.2 4D 张量](#2.2 4D 张量)
      • [2.2.1 通常](#2.2.1 通常)
      • [2.2.2 stable diffusion](#2.2.2 stable diffusion)
  • 3.应用
    • [3.1 问题](#3.1 问题)
    • [3.2 举例](#3.2 举例)

前言:通道数与张量形状都在数值3和4之间变换,容易混淆。

1.通道数:

1.1 channel = 3

RGB 图像具有 3 个通道(红色、绿色和蓝色)。

1.2 channel = 4

Stable Diffusion has 4 latent channels。
如何理解卷积神经网络中的通道(channel)

2.张量形状

2.1 3D 张量

形状为 (C, H, W),其中 C 是通道数,H 是高度,W 是宽度。这适用于单个图像。

2.2 4D 张量

2.2.1 通常

形状为 (B, C, H, W),其中 B 是批次大小,C 是通道数,H 是高度,W 是宽度。这适用于多个图像(例如,批量处理)。

2.2.2 stable diffusion

在img2img中,将image用vae编码并按照timestep加噪:

python 复制代码
		# This code copyed from diffusers.pipline_controlnet_img2img.py
        # 6. Prepare latent variables
        latents = self.prepare_latents(
            image,
            latent_timestep,
            batch_size,
            num_images_per_prompt,
            prompt_embeds.dtype,
            device,
            generator,
        )

image的dim(维度)是3,而latents的dim为4。

让我们先看text2img的prepare_latents函数:

python 复制代码
    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
    def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
        shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
        if isinstance(generator, list) and len(generator) != batch_size:
            raise ValueError(
                f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
                f" size of {batch_size}. Make sure the batch size matches the length of the generators."
            )

        if latents is None:
            latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
        else:
            latents = latents.to(device)

        # scale the initial noise by the standard deviation required by the scheduler
        latents = latents * self.scheduler.init_noise_sigma
        return latents

显然,shape已经规定了latents的dim(4)和排列顺序。

在img2img中:

python 复制代码
    # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.prepare_latents
    def prepare_latents(self, image, timestep, batch_size, num_images_per_prompt, dtype, device, generator=None):
        if not isinstance(image, (torch.Tensor, PIL.Image.Image, list)):
            raise ValueError(
                f"`image` has to be of type `torch.Tensor`, `PIL.Image.Image` or list but is {type(image)}"
            )

        image = image.to(device=device, dtype=dtype)

        batch_size = batch_size * num_images_per_prompt

        if image.shape[1] == 4:
            init_latents = image

        else:
            if isinstance(generator, list) and len(generator) != batch_size:
                raise ValueError(
                    f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
                    f" size of {batch_size}. Make sure the batch size matches the length of the generators."
                )

            elif isinstance(generator, list):
                init_latents = [
                    self.vae.encode(image[i : i + 1]).latent_dist.sample(generator[i]) for i in range(batch_size)
                ]
                init_latents = torch.cat(init_latents, dim=0)
            else:
                init_latents = self.vae.encode(image).latent_dist.sample(generator)

            init_latents = self.vae.config.scaling_factor * init_latents

        if batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] == 0:
            # expand init_latents for batch_size
            deprecation_message = (
                f"You have passed {batch_size} text prompts (`prompt`), but only {init_latents.shape[0]} initial"
                " images (`image`). Initial images are now duplicating to match the number of text prompts. Note"
                " that this behavior is deprecated and will be removed in a version 1.0.0. Please make sure to update"
                " your script to pass as many initial images as text prompts to suppress this warning."
            )
            deprecate("len(prompt) != len(image)", "1.0.0", deprecation_message, standard_warn=False)
            additional_image_per_prompt = batch_size // init_latents.shape[0]
            
            init_latents = torch.cat([init_latents] * additional_image_per_prompt, dim=0)
        elif batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] != 0:
            raise ValueError(
                f"Cannot duplicate `image` of batch size {init_latents.shape[0]} to {batch_size} text prompts."
            )
        else:
            init_latents = torch.cat([init_latents], dim=0)

        shape = init_latents.shape
        noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)

        # get latents
        init_latents = self.scheduler.add_noise(init_latents, noise, timestep)
        latents = init_latents

        return latents

3.应用

3.1 问题

python 复制代码
new_map = texture.permute(1, 2, 0)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 4 is not equal to len(dims) = 3

该问题是张量形状的问题,跟通道数毫无关系。

3.2 举例

问:4D 张量:形状为 (B, C, H, W),其中C可以为3吗?

答:4D 张量的形状为 (B,C,H,W),其中 C 表示通道数。通常情况下,C 可以为 3,这对应于 RGB 图像的三个颜色通道(红色、绿色和蓝色)。

相关推荐
AI绘画小331 天前
【comfyui教程】comfyui古风一键线稿上色,效果还挺惊艳!
人工智能·ai作画·stable diffusion·aigc·comfyui
AI绘画月月1 天前
【comfyui教程】ComfyUI有趣工作流推荐:快速换脸,创意随手掌握!
人工智能·ai作画·stable diffusion·aigc·comfyui
AI绘画咪酱1 天前
【AI绘画】AI绘图教程|stable diffusion(SD)图生图涂鸦超详细攻略,教你快速上手
人工智能·ai作画·stable diffusion·aigc·midjourney
HuggingAI1 天前
stable diffusion 大模型
人工智能·ai·stable diffusion·ai绘画
HuggingAI2 天前
stable diffusion图生图
人工智能·ai·stable diffusion·ai绘画
HuggingAI2 天前
stable diffusion文生图
人工智能·stable diffusion·ai绘画
云端奇趣2 天前
Stable Diffusion 绘画技巧分享,适合新手小白的技巧分享
人工智能·stable diffusion
cskywit3 天前
Stable diffusion 3.5本地运行环境配置记录
stable diffusion
ai绘画-安安妮4 天前
视频号带货书籍,一天佣金1200+(附视频教程)
人工智能·stable diffusion·aigc
papapa键盘侠4 天前
Stable Diffusion Web UI 1.9.4常用插件扩展-WD14-tagger
前端·ui·stable diffusion