Stable Diffusion 实战教程：从安装到图像生成

前言

Stable Diffusion 是当前最流行的开源图像生成模型之一。它能够根据文字描述生成高质量的图像，在创意设计、游戏开发等领域有广泛应用。

我在多个项目中使用过 Stable Diffusion，从简单的图像生成到风格迁移。今天分享完整的实战指南。

环境准备

bash 复制代码

# 创建虚拟环境
conda create -n sd python=3.10
conda activate sd

# 安装依赖
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install diffusers transformers accelerate safetensors
pip install gradio  # 用于可视化

基础使用

文本到图像

python 复制代码

from diffusers import StableDiffusionPipeline
import torch

# 加载模型
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

# 生成图像
prompt = "a beautiful sunset over the ocean, golden hour, photorealistic"
image = pipe(prompt).images[0]

# 保存图像
image.save("sunset.png")

控制生成参数

python 复制代码

def generate_image(
    prompt: str,
    negative_prompt: str = None,
    num_inference_steps: int = 50,
    guidance_scale: float = 7.5,
    seed: int = None
) -> Image:
    """生成图像"""
    generator = torch.Generator("cuda").manual_seed(seed) if seed else None
    
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=num_inference_steps,
        guidance_scale=guidance_scale,
        generator=generator
    ).images[0]
    
    return image

# 使用示例
image = generate_image(
    prompt="a cute cat playing with a ball",
    negative_prompt="ugly, blurry, low quality",
    num_inference_steps=30,
    guidance_scale=7.5,
    seed=42
)

高级技巧

图像到图像

python 复制代码

from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image

# 加载图像到图像模型
img2img_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

# 加载输入图像
init_image = Image.open("input.jpg").convert("RGB")
init_image = init_image.resize((512, 512))

# 生成
prompt = "turn this photo into a painting in the style of Van Gogh"
image = img2img_pipe(
    prompt=prompt,
    image=init_image,
    strength=0.75
).images[0]

image.save("output.png")

深度引导

python 复制代码

from diffusers import StableDiffusionDepth2ImgPipeline

# 加载深度模型
depth_pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-depth",
    torch_dtype=torch.float16
).to("cuda")

# 使用深度图引导
prompt = "a futuristic city skyline"
image = depth_pipe(
    prompt=prompt,
    image=init_image,
    depth_map=None  # 自动计算深度
).images[0]

模型微调

准备数据集

python 复制代码

from datasets import load_dataset

# 加载数据集
dataset = load_dataset("lambdalabs/pokemon-blip-captions")

# 预处理
def preprocess(examples):
    images = [image.convert("RGB").resize((512, 512)) for image in examples["image"]]
    return {"images": images, "captions": examples["text"]}

dataset = dataset.map(preprocess, batched=True)

训练脚本

python 复制代码

from diffusers import StableDiffusionPipeline
from diffusers.training_utils import set_seed

# 设置种子
set_seed(42)

# 加载模型
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id)

# 配置训练参数
training_args = {
    "output_dir": "./pokemon-model",
    "per_device_train_batch_size": 4,
    "gradient_accumulation_steps": 4,
    "learning_rate": 1e-5,
    "num_train_epochs": 10,
    "logging_steps": 10,
    "save_steps": 100
}

# 开始训练（简化示例）
# trainer.train()

Web UI 部署

python 复制代码

import gradio as gr

def generate(prompt, negative_prompt, steps, scale):
    """生成图像"""
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=steps,
        guidance_scale=scale
    ).images[0]
    return image

# 创建界面
with gr.Blocks() as demo:
    gr.Markdown("# Stable Diffusion Demo")
    
    with gr.Row():
        with gr.Column():
            prompt = gr.Textbox(label="Prompt")
            negative_prompt = gr.Textbox(label="Negative Prompt")
            steps = gr.Slider(minimum=10, maximum=100, value=50, label="Steps")
            scale = gr.Slider(minimum=1, maximum=20, value=7.5, label="Guidance Scale")
            generate_btn = gr.Button("Generate")
        
        with gr.Column():
            output = gr.Image(label="Output")
    
    generate_btn.click(generate, inputs=[prompt, negative_prompt, steps, scale], outputs=output)

demo.launch()

常见问题

显存不足

python 复制代码

# 解决方案：使用安全模式
pipe.enable_attention_slicing()

# 或使用 CPU 卸载
pipe.enable_model_cpu_offload()

# 或减少 batch size
pipe.set_progress_bar_config(disable=True)

生成质量差

python 复制代码

# 提高质量的技巧
# 1. 使用更高的 steps
# 2. 调整 guidance_scale
# 3. 添加详细的 negative prompt
# 4. 使用更好的模型（如 SDXL）

总结

Stable Diffusion 是强大的图像生成工具：

基础用法：文本到图像的简单生成
高级技巧：图像到图像、深度引导
微调：适应特定风格或主题
部署：构建 Web 应用

关键要点：

提示词质量直接影响生成结果
negative prompt 很重要
调整参数需要经验
大显存 GPU 能显著提升速度