Python 并发三剑客：多线程、多进程与协程的实战抉择

写在前面：这篇文章源于我在一次生产事故后的深度反思。当时我们用多线程处理图片压缩任务，CPU 跑满了，但吞吐量却上不去。那一刻我意识到，很多开发者对并发模型的选择，凭的是"感觉"而不是"理解"。这篇文章，就是我想和你聊清楚的事。

一、先把概念摆平：三者到底是什么

在动手选型之前，我们得先搞清楚这三个东西的本质区别，而不是背定义。

多线程（Threading）：多个线程共享同一进程的内存空间，切换开销小，但在 CPython 中受 GIL（全局解释器锁）限制，同一时刻只有一个线程在执行 Python 字节码。

多进程（Multiprocessing）：每个进程拥有独立的内存空间和 Python 解释器，真正实现并行计算，但进程间通信（IPC）和内存开销更大。

协程（Asyncio/Coroutine） ：单线程内的协作式并发，通过 await 主动让出控制权，切换开销极小，适合大量 I/O 等待场景。

用一张表格直观对比：

维度	多线程	多进程	协程
并行能力	受 GIL 限制，伪并行	真并行	单线程并发，非并行
内存开销	小（共享内存）	大（独立内存）	极小
切换开销	中	大	极小
适合场景	I/O 密集（兼容性好）	CPU 密集	高并发 I/O
编程复杂度	中（需处理竞态）	中（需处理 IPC）	中（需 async/await）
调试难度	高（竞态条件）	中	中

二、GIL 是什么，为什么它让多线程"名不副实"

这是理解 Python 并发的核心。GIL 是 CPython 解释器的一把全局锁，确保同一时刻只有一个线程执行 Python 字节码，目的是保护内存管理的线程安全。

python 复制代码

import threading
import time

# 验证 GIL 对 CPU 密集任务的影响
def cpu_task(n):
    """纯 CPU 计算"""
    count = 0
    for _ in range(n):
        count += 1
    return count

# 单线程
start = time.time()
cpu_task(50_000_000)
cpu_task(50_000_000)
print(f"单线程耗时: {time.time() - start:.2f}s")

# 多线程（你会发现并没有快多少，甚至更慢）
start = time.time()
t1 = threading.Thread(target=cpu_task, args=(50_000_000,))
t2 = threading.Thread(target=cpu_task, args=(50_000_000,))
t1.start(); t2.start()
t1.join(); t2.join()
print(f"多线程耗时: {time.time() - start:.2f}s")

运行结果通常会让你大吃一惊：多线程版本不仅没快，反而因为 GIL 争抢和线程切换开销，耗时相近甚至更长。

但 GIL 在 I/O 等待时会释放，这是多线程在 I/O 场景依然有价值的根本原因。

三、三种负载类型的选型策略

3.1 CPU 密集型 → 多进程

典型场景：图片压缩/转码、视频处理、大规模数值计算、加密解密、机器学习推理。

python 复制代码

from multiprocessing import Pool
from PIL import Image
import os

def compress_image(args):
    """CPU 密集：图片压缩"""
    input_path, output_path, quality = args
    with Image.open(input_path) as img:
        # 转换色彩模式，去除 alpha 通道
        if img.mode in ('RGBA', 'P'):
            img = img.convert('RGB')
        img.save(output_path, 'JPEG', quality=quality, optimize=True)
    return output_path

def batch_compress(image_list, quality=75):
    """使用多进程池并行压缩"""
    # cpu_count() 自动获取 CPU 核心数
    with Pool(processes=os.cpu_count()) as pool:
        results = pool.map(compress_image, image_list)
    return results

# 使用示例
tasks = [
    ('input/photo1.png', 'output/photo1.jpg', 75),
    ('input/photo2.png', 'output/photo2.jpg', 75),
    # ... 更多任务
]
batch_compress(tasks)

多进程的关键注意点：

进程间传递数据有序列化开销（pickle），避免传递大对象
使用 Pool.map 而非手动管理进程，更安全
进程数通常设为 CPU 核心数，过多反而因调度开销下降

3.2 I/O 密集型 → 协程（首选）或多线程

典型场景：HTTP 请求、数据库查询、文件读写、消息队列消费。

协程在高并发 I/O 场景下是绝对的王者，因为它的切换开销接近于零：

python 复制代码

import asyncio
import aiohttp
import time

async def fetch_url(session, url):
    """异步 HTTP 请求"""
    async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
        return await resp.text()

async def fetch_all(urls):
    """并发抓取所有 URL"""
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        # gather 并发执行所有任务
        results = await asyncio.gather(*tasks, return_exceptions=True)
    return results

# 对比：顺序请求 vs 协程并发
urls = [f"https://httpbin.org/delay/1" for _ in range(10)]

start = time.time()
asyncio.run(fetch_all(urls))
print(f"协程并发 10 个请求耗时: {time.time() - start:.2f}s")
# 结果约 1~2s，而顺序请求需要 10s+

什么时候用多线程而不是协程？当你依赖的库不支持 async（比如老版本的数据库驱动、某些 SDK），多线程是更务实的选择：

python 复制代码

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch_sync(url):
    """同步请求，用线程池并发"""
    return requests.get(url, timeout=10).text

with ThreadPoolExecutor(max_workers=20) as executor:
    results = list(executor.map(fetch_sync, urls))

3.3 混合型负载 → 分层架构

这是最考验设计能力的场景。混合型负载的核心思路是：不同类型的任务交给最擅长的并发模型处理，通过队列解耦。

四、实战案例：图片处理 + 网络请求 + 数据库存储的任务系统

这是一个真实的业务场景：用户上传图片 → 下载原图 → 压缩处理 → 存储到数据库。三个阶段分别是 I/O 密集、CPU 密集、I/O 密集，如何拆分？

架构设计

复制代码

[协程层] 网络下载 (asyncio + aiohttp)
      ↓ 队列传递原始图片数据
[进程层] 图片压缩 (multiprocessing.Pool)
      ↓ 队列传递压缩结果
[协程层] 数据库存储 (asyncio + asyncpg)

完整实现

python 复制代码

import asyncio
import aiohttp
import asyncpg
from multiprocessing import Pool
from concurrent.futures import ProcessPoolExecutor
from io import BytesIO
from PIL import Image
import os

# ============ CPU 密集层：图片压缩（运行在子进程中）============

def compress_image_bytes(image_data: bytes, quality: int = 75) -> bytes:
    """
    在子进程中执行图片压缩
    注意：此函数必须是模块级别的，才能被 pickle 序列化
    """
    with Image.open(BytesIO(image_data)) as img:
        if img.mode in ('RGBA', 'P'):
            img = img.convert('RGB')
        output = BytesIO()
        img.save(output, format='JPEG', quality=quality, optimize=True)
        return output.getvalue()


# ============ I/O 层：下载与存储（运行在协程中）============

async def download_image(session: aiohttp.ClientSession, url: str) -> bytes:
    """异步下载图片"""
    async with session.get(url) as resp:
        resp.raise_for_status()
        return await resp.read()

async def save_to_db(pool: asyncpg.Pool, url: str, data: bytes):
    """异步写入数据库"""
    async with pool.acquire() as conn:
        await conn.execute(
            "INSERT INTO images (source_url, compressed_data, size) VALUES ($1, $2, $3)",
            url, data, len(data)
        )

# ============ 核心调度器：串联三个阶段 ============

async def process_pipeline(urls: list[str], db_dsn: str):
    """
    混合并发管道：
    - 下载：协程并发
    - 压缩：进程池（CPU 密集）
    - 存储：协程并发
    """
    # 初始化资源
    db_pool = await asyncpg.create_pool(db_dsn, min_size=5, max_size=20)
    process_executor = ProcessPoolExecutor(max_workers=os.cpu_count())
    loop = asyncio.get_event_loop()

    async with aiohttp.ClientSession() as session:
        # 阶段一：并发下载所有图片
        print(f"开始下载 {len(urls)} 张图片...")
        download_tasks = [download_image(session, url) for url in urls]
        raw_images = await asyncio.gather(*download_tasks, return_exceptions=True)

        # 阶段二：提交到进程池压缩（CPU 密集，不阻塞事件循环）
        print("提交图片压缩任务到进程池...")
        compress_tasks = []
        valid_pairs = []  # (url, raw_data) 过滤掉下载失败的

        for url, raw in zip(urls, raw_images):
            if isinstance(raw, Exception):
                print(f"下载失败: {url} -> {raw}")
                continue
            valid_pairs.append(url)
            # run_in_executor 将阻塞调用包装为协程，不阻塞事件循环
            task = loop.run_in_executor(
                process_executor,
                compress_image_bytes,
                raw,
                75
            )
            compress_tasks.append(task)

        compressed_images = await asyncio.gather(*compress_tasks, return_exceptions=True)

        # 阶段三：并发写入数据库
        print("写入数据库...")
        save_tasks = []
        for url, compressed in zip(valid_pairs, compressed_images):
            if isinstance(compressed, Exception):
                print(f"压缩失败: {url} -> {compressed}")
                continue
            save_tasks.append(save_to_db(db_pool, url, compressed))

        await asyncio.gather(*save_tasks)

    # 清理资源
    process_executor.shutdown(wait=True)
    await db_pool.close()
    print("全部任务完成")


# ============ 入口 ============

if __name__ == '__main__':
    image_urls = [
        "https://example.com/image1.jpg",
        "https://example.com/image2.jpg",
        # ...
    ]
    DB_DSN = "postgresql://user:password@localhost/mydb"
    asyncio.run(process_pipeline(image_urls, DB_DSN))

关键设计决策解析

为什么用 run_in_executor 而不是直接调用进程池？

loop.run_in_executor 是协程与阻塞代码之间的桥梁。它把阻塞调用放到线程池或进程池执行，同时返回一个可 await 的 Future，事件循环在等待期间可以继续处理其他协程，不会被阻塞。

为什么数据库连接用连接池？

每次 acquire() 从池中借用连接，用完自动归还。避免了频繁建立/断开连接的开销，也防止并发过高时连接数耗尽。

五、性能调优：几个容易被忽视的细节

5.1 进程池的 worker 数量

python 复制代码

import os

# CPU 密集：worker 数 = CPU 核心数
cpu_workers = os.cpu_count()

# I/O 密集（线程池）：可以适当放大，但不是越多越好
# 经验值：核心数 * 2 到 核心数 * 5
io_workers = os.cpu_count() * 4

# 协程：并发数由 semaphore 控制，防止资源耗尽
semaphore = asyncio.Semaphore(100)  # 最多 100 个并发请求

async def fetch_with_limit(session, url):
    async with semaphore:
        return await fetch_url(session, url)

5.2 避免协程中的"意外阻塞"

这是协程最常见的坑：在 async 函数里调用了同步阻塞操作，整个事件循环都会卡住。

python 复制代码

import asyncio
import aiofiles  # 异步文件 I/O

# ❌ 错误：在协程中使用同步文件操作
async def bad_read(path):
    with open(path, 'r') as f:  # 这会阻塞事件循环！
        return f.read()

# ✅ 正确：使用异步文件库
async def good_read(path):
    async with aiofiles.open(path, 'r') as f:
        return await f.read()

# ✅ 或者：用 run_in_executor 包装
async def also_good_read(path):
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, open(path).read)

5.3 多进程间通信的序列化开销

python 复制代码

from multiprocessing import Pool
import numpy as np

# ❌ 传递大型 numpy 数组，pickle 序列化开销巨大
def process_array(arr):
    return arr.sum()

large_array = np.random.rand(10_000_000)
with Pool(4) as pool:
    # 每次传递都要序列化/反序列化，非常慢
    result = pool.map(process_array, [large_array] * 4)

# ✅ 使用共享内存（Python 3.8+）
from multiprocessing import shared_memory
import numpy as np

shm = shared_memory.SharedMemory(create=True, size=large_array.nbytes)
shared_arr = np.ndarray(large_array.shape, dtype=large_array.dtype, buffer=shm.buf)
np.copyto(shared_arr, large_array)
# 子进程通过 shm.name 访问，无需序列化

六、选型决策树

面对一个新任务，我的选型思路是这样的：

复制代码

任务是否涉及大量等待（网络/磁盘/数据库）？
├── 是 → 是否有成熟的 async 库支持？
│         ├── 是 → 用协程（asyncio）
│         └── 否 → 用多线程（ThreadPoolExecutor）
└── 否 → 是否是纯 CPU 计算？
          ├── 是 → 用多进程（ProcessPoolExecutor）
          └── 混合 → 分层架构：协程处理 I/O，进程池处理 CPU

七、总结

回到文章开头那次生产事故，根因很清晰：用多线程处理 CPU 密集的图片压缩，GIL 让多线程形同虚设，换成多进程后吞吐量提升了近 4 倍。

三个模型没有绝对的优劣，只有适不适合：

CPU 密集 → 多进程，绕开 GIL，真正并行
I/O 密集 → 协程优先，多线程备选
混合负载 → 分层架构，用队列解耦，各司其职

最后想问问你：你在项目中遇到过并发选型踩坑的经历吗？是 GIL 的问题，还是协程里不小心写了阻塞调用？ 欢迎在评论区聊聊，这类"血泪教训"往往比任何教程都有价值。

参考资料

Python 官方文档 - asyncio
Python 官方文档 - multiprocessing
PEP 3156 - Asynchronous IO Support
书籍推荐：《流畅的Python（第2版）》第19章并发部分、《Python Cookbook》第12章