用 Python 玩转 AI 绘图：Stable Diffusion 本地部署指南

想要 AI 绘图却担心隐私泄露？不想付费调用 API？本文带你用 Python 在本地完整部署 Stable Diffusion，从环境搭建到出图，手把手搞定属于自己的 AI 画师。

一、Stable Diffusion 是什么？

Stable Diffusion（SD）是一个开源的文本生成图像（Text-to-Image）深度学习模型。只需输入文字描述，就能生成高质量的图片。

1.1 为什么选择本地部署？

复制代码

┌────────────────────────────────────────────────────────┐
│          云服务 vs 本地部署 对比                         │
├──────────────┬─────────────────┬───────────────────────┤
│    维度       │   云服务(Midjourney等)  │   本地部署(SD)  │
├──────────────┼─────────────────┼───────────────────────┤
│ 费用          │ 按月订阅/按次计费 │ 一次性硬件投入         │
│ 隐私          │ 数据上传云端     │ ✅ 完全本地，无泄露风险 │
│ 自由度        │ 受平台审核限制   │ ✅ 无限制               │
│ 可定制性      │ 固定模型         │ ✅ 任意切换模型/LoRA    │
│ API 集成      │ 受限/需付费      │ ✅ Python 完全控制      │
│ 硬件要求      │ 无              │ 需要 NVIDIA GPU        │
└──────────────┴─────────────────┴───────────────────────┘

1.2 SD 版本演进

复制代码

SD 1.4 (2022) → SD 1.5 (2022) → SD 2.0/2.1 (2022) → SDXL (2023) → SD 3.0 (2024) → SD 3.5 (2024)
   │                │                                      │
   └── 经典稳定 ─────┘                                      └── 高质量，推荐使用

二、硬件与环境要求

2.1 最低 / 推荐配置

硬件	最低要求	推荐配置
GPU	NVIDIA 8GB VRAM	NVIDIA 12GB+ VRAM (RTX 3060+)
内存	16 GB	32 GB
硬盘	10 GB	50 GB+ SSD
CUDA	11.7+	12.x

2.2 环境搭建

bash 复制代码

# 1. 创建 Python 虚拟环境（推荐 Python 3.10）
conda create -n sd python=3.10 -y
conda activate sd

# 2. 安装 PyTorch（CUDA 12.1 版本）
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 3. 验证 CUDA 是否可用
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}, GPU: {torch.cuda.get_device_name(0)}')"
# 输出示例: CUDA: True, GPU: NVIDIA GeForce RTX 4060

三、方案一：Diffusers 库 ------ 适合开发者

diffusers 是 HuggingFace 官方的扩散模型库，纯 Python API 调用，最适合 Python 开发者集成到自己的项目中。

3.1 安装

bash 复制代码

pip install diffusers transformers accelerate safetensors

3.2 基础文生图（Text-to-Image）

python 复制代码

import torch
from diffusers import StableDiffusionPipeline

def basic_text2img(prompt: str, output_path: str = "output.png"):
    """
    基础文生图 ------ 输入文字，输出图片

    流程: 文字描述 → CLIP 编码 → UNet 去噪 → VAE 解码 → 图片
    """
    # 加载模型（首次会自动下载，约 4~7 GB）
    pipe = StableDiffusionPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5",
        torch_dtype=torch.float16,       # 半精度，节省显存
        safety_checker=None              # 关闭安全检查（可选）
    )
    pipe.to("cuda")

    # 开启显存优化
    pipe.enable_attention_slicing()      # 分块注意力，减少显存占用

    # 生成图片
    image = pipe(
        prompt=prompt,
        num_inference_steps=30,          # 去噪步数（20~50，越多越精细）
        guidance_scale=7.5,              # CFG 引导系数（7~12，越大越贴合描述）
        width=512,
        height=512
    ).images[0]

    image.save(output_path)
    print(f"图片已保存: {output_path}")
    return image

# 使用示例
basic_text2img(
    "a beautiful sunset over the ocean, highly detailed, 4k, photorealistic",
    "sunset.png"
)

3.3 使用 SDXL 模型（更高质量）

python 复制代码

from diffusers import StableDiffusionXLPipeline

def sdxl_text2img(prompt: str, output_path: str = "sdxl_output.png"):
    """
    SDXL 文生图 ------ 质量远超 SD 1.5

    SDXL 优势:
    - 默认 1024x1024 分辨率
    - 更好的文字理解能力
    - 更真实的色彩和细节
    """
    pipe = StableDiffusionXLPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        torch_dtype=torch.float16,
        variant="fp16"
    )
    pipe.to("cuda")

    # 显存优化（SDXL 模型更大，优化更必要）
    pipe.enable_attention_slicing()
    pipe.enable_vae_slicing()

    image = pipe(
        prompt=prompt,
        negative_prompt="blurry, low quality, distorted, deformed",
        num_inference_steps=40,
        guidance_scale=8.0,
        width=1024,
        height=1024
    ).images[0]

    image.save(output_path)
    print(f"SDXL 图片已保存: {output_path}")
    return image

3.4 关键参数详解

复制代码

┌─────────────────────────────────────────────────────────────┐
│                  Stable Diffusion 核心参数                    │
├─────────────────┬───────────────────────────────────────────┤
│ 参数             │ 说明                                      │
├─────────────────┼───────────────────────────────────────────┤
│ prompt          │ 正向提示词：描述你想要的画面                  │
│ negative_prompt │ 反向提示词：描述你不想要的元素                │
│ num_inference_steps │ 去噪步数：20~50（↑ 质量 ↑ 速度 ↓）     │
│ guidance_scale  │ CFG 值：7~12（↑ 越贴合文字 ↑ 画面可能僵硬） │
│ width / height  │ 图片尺寸：512/768/1024                     │
│ seed            │ 随机种子：固定种子可复现同一张图              │
└─────────────────┴───────────────────────────────────────────┘

3.5 显存不足的解决方案

python 复制代码

def low_vram_generate(pipe, prompt, output_path="output.png"):
    """
    低显存生成方案（4~6 GB VRAM 也能跑）
    """
    # 1. CPU 卸载：模型按需加载到 GPU，用完即卸回 CPU
    pipe.enable_model_cpu_offload()

    # 2. 分块注意力：降低注意力计算的峰值显存
    pipe.enable_attention_slicing()

    # 3. VAE 分块：VAE 解码时分块处理
    pipe.enable_vae_slicing()

    # 4. 降低分辨率
    image = pipe(
        prompt=prompt,
        width=512,
        height=512,
        num_inference_steps=20   # 减少步数
    ).images[0]

    image.save(output_path)

四、方案二：Stable Diffusion WebUI ------ 适合非开发者

如果你更偏好图形界面操作，AUTOMATIC1111 的 WebUI 是最受欢迎的选择。

4.1 安装 WebUI

bash 复制代码

# 1. 克隆仓库
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

# 2. Windows 用户直接运行
./webui.bat

# 3. Linux 用户
./webui.sh

启动后浏览器自动打开 http://127.0.0.1:7860，即可看到 Web 界面。

4.2 通过 Python 调用 WebUI API

python 复制代码

import requests
import base64
from pathlib import Path

def sd_webui_api(prompt: str, output_path: str = "api_output.png"):
    """
    通过 API 调用本地 WebUI
    先启动 WebUI: ./webui.bat --api
    """
    url = "http://127.0.0.1:7860/sdapi/v1/txt2img"

    payload = {
        "prompt": prompt,
        "negative_prompt": "blurry, low quality, deformed",
        "steps": 30,
        "cfg_scale": 7.5,
        "width": 512,
        "height": 512,
        "sampler_name": "DPM++ 2M Karras",
        "seed": -1  # -1 表示随机
    }

    response = requests.post(url, json=payload)
    result = response.json()

    # 解码并保存图片
    image_data = base64.b64decode(result["images"][0])
    with open(output_path, "wb") as f:
        f.write(image_data)

    print(f"API 生成完成: {output_path}")
    # 返回种子值，方便复现
    info = result.get("info", {})
    print(f"Seed: {info.get('seed', 'unknown')}")

# 使用
sd_webui_api("a cyberpunk city at night, neon lights, rain, 4k")

五、模型管理

5.1 下载模型

python 复制代码

"""
模型下载指南

推荐模型来源:
1. HuggingFace: https://huggingface.co/models?pipeline_tag=text-to-image
2. Civitai:     https://civitai.com （最大的 SD 模型社区）

常用模型:
"""
MODELS = {
    # ---- SD 1.5 系列 ----
    "sd15": "runwayml/stable-diffusion-v1-5",
    "anything-v5": "stablediffusionapi/anything-v5",          # 二次元风格
    "realistic-vision": "SG161222/Realistic_Vision_V5.1",     # 真实人像

    # ---- SDXL 系列 ----
    "sdxl": "stabilityai/stable-diffusion-xl-base-1.0",
    "sdxl-turbo": "stabilityai/sdxl-turbo",                   # 快速生成
    "juggernaut-xl": "RunDiffusion/Juggernaut-XL-v9",        # 综合高质量

    # ---- SD 3.0+ ----
    "sd3": "stabilityai/stable-diffusion-3-medium",
}

5.2 加载本地模型

python 复制代码

from diffusers import StableDiffusionPipeline

def load_local_model(model_path: str):
    """
    加载本地模型文件（.safetensors 或 HuggingFace 格式）

    model_path 可以是:
    - 本地文件夹路径: "./models/sd15"
    - HuggingFace repo: "runwayml/stable-diffusion-v1-5"
    - 本地 .safetensors 文件: "./models/model.safetensors"
    """
    pipe = StableDiffusionPipeline.from_single_file(
        model_path,
        torch_dtype=torch.float16
    )
    pipe.to("cuda")
    return pipe

5.3 LoRA 微调风格

python 复制代码

def apply_lora(pipe, lora_path: str, lora_scale: float = 0.8):
    """
    加载 LoRA 风格微调

    LoRA: 轻量级适配器，可以在不修改基础模型的情况下改变画风
    例如: 动漫风、水彩风、某位画师风格等
    """
    pipe.load_lora_weights(lora_path)

    # 生成时通过 cross_attention_kwargs 控制强度
    image = pipe(
        prompt="a girl in a garden, masterpiece",
        cross_attention_kwargs={"scale": lora_scale}
    ).images[0]
    return image

六、高级功能

6.1 图生图（Image-to-Image）

python 复制代码

from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image

def image_to_image(init_image_path: str, prompt: str, strength: float = 0.7):
    """
    图生图：在已有图片基础上进行 AI 重绘

    参数:
        init_image_path: 原始图片路径
        prompt: 重绘提示词
        strength: 重绘强度 (0.0~1.0)
                  0.3 → 轻微修改
                  0.7 → 中等改变
                  0.9 → 几乎完全重绘
    """
    init_image = Image.open(init_image_path).convert("RGB")
    init_image = init_image.resize((512, 512))

    pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5",
        torch_dtype=torch.float16
    ).to("cuda")

    image = pipe(
        prompt=prompt,
        image=init_image,
        strength=strength,
        num_inference_steps=30
    ).images[0]

    image.save("img2img_output.png")
    print("图生图完成")
    return image

# 示例: 将照片转为油画风格
image_to_image("photo.jpg", "oil painting style, masterpiece, highly detailed")

6.2 图片局部重绘（Inpainting）

python 复制代码

from diffusers import StableDiffusionInpaintPipeline

def inpaint(image_path, mask_path, prompt):
    """
    局部重绘：只修改图片中被遮罩覆盖的区域

    应用场景:
    - 移除图片中的某个物体
    - 替换背景
    - 修改人物服装
    """
    image = Image.open(image_path).resize((512, 512))
    mask = Image.open(mask_path).convert("L").resize((512, 512))

    pipe = StableDiffusionInpaintPipeline.from_pretrained(
        "runwayml/stable-diffusion-inpainting",
        torch_dtype=torch.float16
    ).to("cuda")

    result = pipe(
        prompt=prompt,
        image=image,
        mask_image=mask,
        num_inference_steps=30
    ).images[0]

    result.save("inpaint_output.png")

6.3 ControlNet ------ 精准控制构图

python 复制代码

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image

def controlnet_canny(image_path: str, prompt: str):
    """
    ControlNet: 通过边缘检测图控制 AI 生成的构图

    支持多种控制模式:
    - Canny 边缘: 精确轮廓控制
    - Depth 深度: 空间结构控制
    - Pose 姿态: 人体姿态控制
    - Scribble: 涂鸦草图控制
    """
    # 加载 ControlNet 模型
    controlnet = ControlNetModel.from_pretrained(
        "lllyasviel/sd-controlnet-canny",
        torch_dtype=torch.float16
    )

    pipe = StableDiffusionControlNetPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5",
        controlnet=controlnet,
        torch_dtype=torch.float16
    ).to("cuda")

    # 加载输入图片并提取边缘
    input_image = load_image(image_path)

    import cv2
    import numpy as np

    image_np = np.array(input_image)
    canny_image = cv2.Canny(image_np, 100, 200)
    canny_image = Image.fromarray(canny_image)

    # 生成
    image = pipe(
        prompt=prompt,
        image=canny_image,
        num_inference_steps=30,
        guidance_scale=7.5
    ).images[0]

    image.save("controlnet_output.png")

七、批量生成与自动化

7.1 批量生成不同风格

python 复制代码

def batch_generate(pipe, base_prompt: str, styles: list, output_dir: str = "./outputs"):
    """
    批量生成不同风格的图片
    """
    import os
    os.makedirs(output_dir, exist_ok=True)

    for i, style in enumerate(styles):
        prompt = f"{base_prompt}, {style}"
        image = pipe(
            prompt=prompt,
            num_inference_steps=30,
            guidance_scale=7.5
        ).images[0]

        filename = f"{output_dir}/gen_{i:03d}_{style.replace(' ', '_')}.png"
        image.save(filename)
        print(f"[{i+1}/{len(styles)}] 已生成: {filename}")

# 使用
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

batch_generate(
    pipe,
    base_prompt="a beautiful mountain landscape",
    styles=[
        "oil painting style",
        "watercolor painting",
        "anime style",
        "photorealistic, 8k",
        "cyberpunk neon",
        "studio ghibli style"
    ]
)

7.2 固定种子复现结果

python 复制代码

def reproducible_generation(pipe, prompt: str, seed: int = 42):
    """
    固定随机种子，确保每次生成完全相同的图片
    适用于: 测试、对比不同提示词效果、分享参数
    """
    import torch

    generator = torch.Generator("cuda").manual_seed(seed)

    image = pipe(
        prompt=prompt,
        generator=generator,
        num_inference_steps=30
    ).images[0]

    image.save(f"seed_{seed}.png")
    return image

八、Prompt 提示词工程

8.1 提示词结构模板

复制代码

┌─────────────────────────────────────────────────────────────┐
│                   提示词万能公式                              │
│                                                             │
│  [主体] + [场景/环境] + [光线] + [风格] + [画质词]           │
│                                                             │
│  示例:                                                      │
│  a young woman (主体)                                       │
│  standing in a cherry blossom garden (场景)                  │
│  golden hour lighting, soft shadows (光线)                   │
│  oil painting style (风格)                                   │
│  masterpiece, best quality, highly detailed, 4k (画质词)     │
└─────────────────────────────────────────────────────────────┘

8.2 常用画质增强词

python 复制代码

QUALITY_BOOSTERS = [
    # 通用画质
    "masterpiece", "best quality", "highly detailed",
    "sharp focus", "8k uhd", "high resolution",

    # 光影
    "cinematic lighting", "volumetric lighting",
    "golden hour", "dramatic lighting",

    # 摄影效果
    "bokeh", "depth of field", "film grain",
    "DSLR, 35mm lens", "RAW photo",
]

NEGATIVE_PROMPT = (
    "blurry, low quality, low resolution, "
    "deformed, distorted, disfigured, bad anatomy, "
    "bad hands, missing fingers, extra fingers, "
    "watermark, text, logo, signature"
)

九、完整工具类封装

python 复制代码

"""
SDToolKit ------ 一个封装好的 Stable Diffusion 工具类
"""
import torch
from PIL import Image
from pathlib import Path
from diffusers import StableDiffusionPipeline


class SDToolKit:
    """Stable Diffusion 本地生成工具包"""

    def __init__(self, model_name: str = "runwayml/stable-diffusion-v1-5"):
        print(f"正在加载模型: {model_name} ...")
        self.pipe = StableDiffusionPipeline.from_pretrained(
            model_name,
            torch_dtype=torch.float16,
            safety_checker=None
        )
        self.pipe.to("cuda")
        self.pipe.enable_attention_slicing()
        print("模型加载完成！")

    def generate(
        self,
        prompt: str,
        negative_prompt: str = "",
        width: int = 512,
        height: int = 512,
        steps: int = 30,
        cfg_scale: float = 7.5,
        seed: int = -1,
        output_path: str = None
    ) -> Image.Image:
        """生成图片"""
        generator = None
        if seed != -1:
            generator = torch.Generator("cuda").manual_seed(seed)

        result = self.pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            width=width,
            height=height,
            num_inference_steps=steps,
            guidance_scale=cfg_scale,
            generator=generator
        )

        image = result.images[0]

        if output_path:
            Path(output_path).parent.mkdir(parents=True, exist_ok=True)
            image.save(output_path)
            print(f"已保存: {output_path}")

        return image

    def batch_generate(self, prompts: list, output_dir: str = "./outputs"):
        """批量生成"""
        Path(output_dir).mkdir(exist_ok=True)
        images = []
        for i, prompt in enumerate(prompts):
            path = f"{output_dir}/img_{i:03d}.png"
            img = self.generate(prompt=prompt, output_path=path)
            images.append(img)
        print(f"\n批量生成完成！共 {len(images)} 张图片")
        return images


# ===== 使用示例 =====
if __name__ == "__main__":
    sd = SDToolKit("runwayml/stable-diffusion-v1-5")

    # 单张生成
    sd.generate(
        prompt="a cute cat wearing a tiny hat, masterpiece, best quality",
        negative_prompt="blurry, deformed, bad anatomy",
        output_path="outputs/cat.png"
    )

    # 批量生成
    sd.batch_generate([
        "a serene lake at sunrise, photorealistic, 8k",
        "a cyberpunk street, neon lights, rain, highly detailed",
        "a medieval castle on a cliff, fantasy art, epic",
        "a steaming cup of coffee on a wooden table, cozy atmosphere"
    ])

十、常见问题排查

复制代码

┌──────────────────────────────────────────────────────────────┐
│                    常见问题 & 解决方案                         │
├─────────────────────────────┬────────────────────────────────┤
│ 问题                         │ 解决方案                       │
├─────────────────────────────┼────────────────────────────────┤
│ CUDA out of memory          │ 减小分辨率/降低 steps/开启      │
│                             │ attention_slicing              │
├─────────────────────────────┼────────────────────────────────┤
│ 生成的图片全黑               │ 检查 safety_checker 是否误判;  │
│                             │ 调整 prompt 避开敏感词          │
├─────────────────────────────┼────────────────────────────────┤
│ 图片模糊/质量差              │ 提高 steps(30+); 添加画质词;   │
│                             │ 尝试更好的模型(SDXL)            │
├─────────────────────────────┼────────────────────────────────┤
│ 手指/人脸畸形               │ 使用负面提示词; 尝试 SDXL;     │
│                             │ 后期用 inpaint 修复             │
├─────────────────────────────┼────────────────────────────────┤
│ 下载模型太慢                │ 使用镜像站; 手动下载 .safeten-  │
│                             │ sors 文件到本地                 │
├─────────────────────────────┼────────────────────────────────┤
│ 生成速度太慢                │ 使用 sdxl-turbo; 降低 steps;   │
│                             │ 开启 xformers; 升级 GPU        │
└─────────────────────────────┴────────────────────────────────┘

十一、学习路线

复制代码

入门 ────────────────────────────────────────────── 进阶

SD 1.5 文生图
  │
  ├── 掌握提示词工程
  ├── 学会参数调优
  │
  ▼
SDXL 高质量生成
  │
  ├── 图生图 (img2img)
  ├── 局部重绘 (inpainting)
  │
  ▼
ControlNet 精准控制
  │
  ├── Canny / Depth / Pose
  ├── LoRA 风格微调
  │
  ▼
进阶玩法
  │
  ├── ComfyUI 工作流
  ├── 训练自定义 LoRA
  ├── 视频生成 (SVD)
  └── 集成到 Web 应用

总结

本文完整覆盖了 Stable Diffusion 本地部署的核心内容：

Diffusers 纯 Python API 方案 ------ 适合开发者集成
WebUI 图形界面方案 ------ 适合快速上手和 API 调用
高级功能：图生图、局部重绘、ControlNet
批量生成、显存优化、提示词工程

下一步建议：选一个你感兴趣的模型（推荐 SDXL），跑通基础文生图，然后逐步尝试图生图和 ControlNet，感受 AI 绘图的强大能力。