线程池与进程池：concurrent.futures高效并发

一、引言

在Python开发中，你一定遇到过这样的场景：需要爬取1000个网页，用for循环一个一个请求，慢到怀疑人生；要处理100万行数据，想用多线程加速，但手动创建和管理1000个线程简直是灾难

；写了一个CPU密集型计算程序，开了一堆线程却发现性能反而更差了。

这些问题的本质是：你没有用好并发编程的基础设施------线程池和进程池。

concurrent.futures模块是Python官方提供的高层并发接口，它用统一、简洁的API屏蔽了底层线程和进程的复杂管理，让你能用几乎一样的代码实现多线程和多进程并发。

python 复制代码

python
# 用线程池做I/O密集型任务
from concurrent.futures import ThreadPoolExecutor

def fetch_url(url):
    return requests.get(url).status_code

with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(fetch_url, urls))

python 复制代码

python
# 用进程池做CPU密集型任务（代码几乎一样！）
from concurrent.futures import ProcessPoolExecutor

def cpu_task(n):
    return sum(i*i for i in range(n))

with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(cpu_task, [1000000] * 4))

统一的API、简洁的代码、强大的功能------这就是concurrent.futures的魅力所在。本文将带你从零开始，全面掌握这一高效并发工具。

二、为什么需要线程池和进程池？

2.1 不使用线程池的痛点

如果你手动创建和管理线程，会遇到以下问题：

python 复制代码

python
# 手动创建线程的痛点
import threading

def task(i):
    # 做一些事情
    pass

# 痛点1：创建1000个线程，每个线程占用约8MB内存
threads = [threading.Thread(target=task, args=(i,)) for i in range(1000)]
for t in threads:
    t.start()
for t in threads:
    t.join()
# 内存占用：~8GB！系统直接卡死

创建、销毁开销大：频繁创建销毁线程会严重拖慢性能。
管理复杂：需要手动维护线程列表、等待所有线程完成。
资源不可控：无限制创建可能导致内存耗尽或系统崩溃。
结果获取麻烦：需要借助队列等通信机制才能收集任务返回值。

2.2 线程池/进程池的核心思想

线程池和进程池的核心思想非常简单：预先创建一批可复用的工作线程/进程，任务来了直接分配，任务完成后回收复用。

打个比方：线程池就像一家餐厅的服务团队------不需要每个客人来了才去招一个服务员，而是提前培养好一批服务员，客人来了直接安排。这样既节省了招聘培训的时间，又能灵活应对高峰期。

进程池则更像一支特种部队------每个成员都是独立的（有自己的装备和资源），适合执行独立且复杂的任务。

2.3 concurrent.futures的优势

统一API：ThreadPoolExecutor和ProcessPoolExecutor实现相同的Executor接口，一行代码即可在两者间切换。
自动管理：池的大小、任务队列、资源清理全部自动处理。
Future对象：优雅地获取异步任务的状态、结果和异常。
with语句支持：自动等待任务完成并释放资源。

三、核心概念：Executor与Future

3.1 Executor：执行器抽象

concurrent.futures.Executor是一个抽象基类，定义了执行异步任务的核心接口。它的两个子类ThreadPoolExecutor和ProcessPoolExecutor都实现了这些方法：

|--------------------------------------|-------------------|
| 方法 | 作用 |
| submit(fn, *args, **kwargs) | 提交单个任务，返回Future对象 |
| map(func, *iterables, timeout=None) | 批量提交任务，按顺序返回结果迭代器 |
| shutdown(wait=True) | 关闭执行器，释放资源 |

3.2 Future：异步任务的"收据"

当你调用executor.submit()时，它会立即返回一个Future对象。这就像你在餐厅点餐时拿到的小票------你不需要站在厨房门口等着，而是可以先去做别的事，等餐好了凭小票取餐。

Future对象提供了以下核心方法：

python 复制代码

python
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=2) as executor:
    future = executor.submit(pow, 2, 10)
    
    print(future.done())        # False：任务还没完成
    print(future.result())      # 1024：阻塞等待结果

|-----------------------|----------------|
| 方法 | 说明 |
| result(timeout=None) | 获取任务结果，阻塞直到完成 |
| done() | 判断任务是否完成 |
| cancel() | 尝试取消任务（如果还没开始） |
| exception() | 获取任务抛出的异常 |
| add_done_callback(fn) | 任务完成时自动调用的回调函数 |

四、ThreadPoolExecutor：线程池详解

4.1 基本用法

python 复制代码

python
from concurrent.futures import ThreadPoolExecutor
import time

def worker(name, delay):
    print(f"线程 {name} 开始工作")
    time.sleep(delay)
    print(f"线程 {name} 完成")
    return f"结果-{name}"

# 创建线程池，最多同时运行3个线程
with ThreadPoolExecutor(max_workers=3) as executor:
    # 提交任务，返回Future对象（非阻塞）
    future1 = executor.submit(worker, "A", 2)
    future2 = executor.submit(worker, "B", 1)
    
    # 获取结果（阻塞等待）
    print(future1.result())  # 等待约2秒
    print(future2.result())  # 结果: 结果-B

使用with语句是最佳实践，它确保代码块结束时自动调用shutdown(wait=True)，等待所有任务完成并释放资源。

4.2 批量提交：map方法

如果你需要对一批数据应用同一个函数，map方法是最简洁的选择：

python 复制代码

python
def square(x):
    return x * x

with ThreadPoolExecutor(max_workers=4) as executor:
    # map返回迭代器，按提交顺序产出结果
    results = executor.map(square, [1, 2, 3, 4, 5])
    print(list(results))  # [1, 4, 9, 16, 25]

map方法也支持多个可迭代对象：

python 复制代码

python
def add(a, b):
    return a + b

with ThreadPoolExecutor() as executor:
    results = executor.map(add, [1, 2, 3], [10, 20, 30])
    print(list(results))  # [11, 22, 33]

4.3 灵活调度：submit + as_completed

map有一个限制：它必须等所有任务都完成才能开始返回结果。如果你希望谁先完成就返回谁的结果，可以使用submit配合as_completed：

python 复制代码

python
from concurrent.futures import ThreadPoolExecutor, as_completed
import random

def fetch_data(id):
    delay = random.uniform(0.5, 2)
    time.sleep(delay)
    return f"任务{id}完成，耗时{delay:.2f}s"

with ThreadPoolExecutor(max_workers=3) as executor:
    # 提交所有任务
    futures = [executor.submit(fetch_data, i) for i in range(5)]
    
    # 按完成顺序获取结果
    for future in as_completed(futures):
        result = future.result()
        print(result)

# 输出示例（顺序不定，谁先完成谁先输出）：
# 任务2完成，耗时0.62s
# 任务0完成，耗时0.89s
# 任务4完成，耗时1.23s
# ...

as_completed的核心优势在于：

实时响应：任务完成后立即处理，无需等待最慢的任务
优雅容错：可以在循环中用try/except捕获单个任务的异常，不影响其他任务
资源可控：可以轻松添加超时控制

4.4 设置超时

无论是result()还是as_completed，都可以设置超时：

python 复制代码

python
# 单个任务超时
future = executor.submit(slow_task, 10)
try:
    result = future.result(timeout=5)  # 最多等5秒
except TimeoutError:
    print("任务超时！")
    future.cancel()

# as_completed也支持超时
for future in as_completed(futures, timeout=10):
    result = future.result()

五、ProcessPoolExecutor：进程池详解

5.1 基本用法

ProcessPoolExecutor的用法和ThreadPoolExecutor几乎完全相同，但用途不同：

python 复制代码

python
from concurrent.futures import ProcessPoolExecutor

def cpu_intensive_task(n):
    """CPU密集型任务：计算斐波那契数列"""
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

if __name__ == "__main__":
    with ProcessPoolExecutor(max_workers=4) as executor:
        # 提交多个任务，真正并行执行
        futures = [executor.submit(cpu_intensive_task, 1000000) for _ in range(4)]
        results = [f.result() for f in futures]

5.2 使用map批量提交

python 复制代码

python
from concurrent.futures import ProcessPoolExecutor

def process_data(data):
    return data ** 2 + data * 2

if __name__ == "__main__":
    data = list(range(10000))
    with ProcessPoolExecutor(max_workers=4) as executor:
        # 注意：map返回的是迭代器，按顺序产出结果
        results = executor.map(process_data, data)
        processed = list(results)

5.3 Windows系统的特殊注意事项

在Windows上使用ProcessPoolExecutor时，必须将主程序入口代码放在if name == 'main':块中。这是因为Windows没有fork()机制，子进程会重新导入主模块，如果没有这个保护会导致无限递归创建进程。

python 复制代码

python
# 正确写法
from concurrent.futures import ProcessPoolExecutor

def worker(x):
    return x * x

if __name__ == "__main__":
    with ProcessPoolExecutor(max_workers=4) as executor:
        results = executor.map(worker, range(10))
        print(list(results))

5.4 进程间通信的限制

由于每个进程有独立的内存空间，传递给进程池的参数和返回值必须是可序列化的（能被pickle模块处理）。

python 复制代码

python
# 不可以：lambda函数无法pickle
executor.submit(lambda x: x*2, 10)

# 不可以：在函数内部定义的类
class LocalClass:
    pass
executor.submit(lambda x: x, LocalClass())

# 可以：模块级别定义的函数和类
def my_func(x):
    return x * 2

class MyClass:
    pass
executor.submit(my_func, MyClass())

5.5 进程池的工作原理

ProcessPoolExecutor底层基于multiprocessing模块构建，但提供了更高层的抽象：

初始化时：根据max_workers参数启动指定数量的子进程，这些进程进入等待状态。
提交任务时：函数和参数被序列化，通过队列发送给空闲工作进程。
执行任务时：工作进程反序列化后执行函数，将结果序列化传回主进程。
资源管理：使用with语句或显式调用shutdown()时，向所有工作进程发送退出信号。

内部简化流程示意：

主进程 -> 任务队列 -> 工作进程 -> 结果队列 -> 主进程

ProcessPoolExecutor还包含一个队列管理线程，负责从结果队列中收集完成的任务结果，并与对应的Future对象关联，这样主进程就能通过future.result()获取结果了。

六、线程池 vs 进程池：全面对比

6.1 核心区别

|-------|-----------------------|------------------------|
| 对比维度 | ThreadPoolExecutor | ProcessPoolExecutor |
| 底层模块 | threading | multiprocessing |
| 适用场景 | I/O密集型（网络、文件、数据库） | CPU密集型（计算、图像处理、加密） |
| GIL影响 | 受GIL限制，无法并行执行Python代码 | 绕过GIL，每个进程有独立解释器，可真正并行 |
| 内存占用 | 低（共享父进程内存，约8MB/线程） | 高（每个进程独立内存，约50MB+/进程） |
| 启动速度 | 快（微秒级） | 慢（毫秒级，需复制内存） |
| 数据共享 | 容易（共享内存，需加锁） | 困难（需序列化或IPC） |
| 异常隔离 | 一个线程崩溃可能影响整个进程 | 进程崩溃不影响其他进程 |
| 最大并发量 | 数千 | 数十到数百 |

6.2 池大小设置指南

ThreadPoolExecutor：I/O密集型：可以设置较大的值（如50-200），取决于目标服务的承载能力，默认线程池大小通常是min(32, os.cpu_count() + 4)，对多数I/O任务够用。

ProcessPoolExecutor：通常设置为os.cpu_count()或略高（如+1~2），超过CPU核心数只会增加上下文切换开销，没有性能提升。

python 复制代码

python
import os
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# 获取CPU核心数
cpu_count = os.cpu_count()

# 推荐设置
thread_pool_size = min(64, cpu_count * 4)  # I/O密集型，可适当放大
process_pool_size = cpu_count               # CPU密集型，不宜超过核心数

七、实战案例

7.1 案例一：多线程爬虫

场景：批量下载100个网页内容

python 复制代码

python
import requests
import time
from concurrent.futures import ThreadPoolExecutor, as_completed

urls = ["https://httpbin.org/delay/0.5"] * 100  # 模拟延迟响应

def fetch(url):
    response = requests.get(url, timeout=10)
    return response.status_code

# 同步版本
start = time.time()
sync_results = [fetch(url) for url in urls]
print(f"同步耗时: {time.time() - start:.2f}s")  # 约50秒

# 线程池版本
start = time.time()
with ThreadPoolExecutor(max_workers=20) as executor:
    futures = {executor.submit(fetch, url): url for url in urls}
    for future in as_completed(futures):
        status = future.result()
print(f"线程池耗时: {time.time() - start:.2f}s")  # 约2.5秒，提升20倍！

7.2 案例二：多进程图像处理

场景：批量处理图片（灰度转换、尺寸调整等CPU密集型操作）

python 复制代码

python
from PIL import Image
from concurrent.futures import ProcessPoolExecutor
import os

def process_image(image_path):
    """单张图片处理：转为灰度并缩放到50%"""
    with Image.open(image_path) as img:
        # CPU密集型操作
        gray = img.convert('L')
        resized = gray.resize((img.width // 2, img.height // 2))
        output_path = f"processed_{os.path.basename(image_path)}"
        resized.save(output_path)
    return output_path

if __name__ == "__main__":
    images = [f"img_{i}.jpg" for i in range(100)]
    
    with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
        results = list(executor.map(process_image, images))
    print(f"已处理 {len(results)} 张图片")

7.3 案例三：混合架构（线程池 + 进程池）

场景：下载图片 → 压缩处理（I/O + CPU混合任务）

python 复制代码

python
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import requests
import asyncio

def download_image(url):
    """I/O密集型：下载图片"""
    response = requests.get(url)
    return response.content

def compress_image(image_data):
    """CPU密集型：压缩图片（模拟）"""
    # 模拟压缩处理
    return len(image_data) // 2

def process_image_pipeline(url):
    """混合流水线"""
    # 下载（I/O）
    img_data = download_image(url)
    # 压缩（CPU）
    compressed = compress_image(img_data)
    return compressed

# 方案1：直接混合（不推荐，会在线程池中跑CPU任务）
# 方案2：分别处理
def main():
    urls = [...]
    
    # 步骤1：用线程池并发下载
    with ThreadPoolExecutor(max_workers=20) as downloader:
        downloaded = list(downloader.map(download_image, urls))
    
    # 步骤2：用进程池并行压缩
    with ProcessPoolExecutor(max_workers=os.cpu_count()) as compressor:
        compressed = list(compressor.map(compress_image, downloaded))
    
    return compressed

八、异常处理与资源管理

8.1 Future中的异常处理

线程池或进程池中抛出的异常不会立即在主线程中出现，而是被封装在Future对象中，直到调用result()时才重新抛出。

python 复制代码

python
def divide(a, b):
    return a / b

with ThreadPoolExecutor() as executor:
    future = executor.submit(divide, 10, 0)
    # 异常不会在这里抛出！
    
    # 必须调用result()才能捕获
    try:
        result = future.result()
    except ZeroDivisionError as e:
        print(f"捕获到异常: {e}")

8.2 三种异常处理模式

|-----------------|---------------|---------------------------------|
| 模式 | 适用场景 | 代码示例 |
| submit + result | 少量任务，需要立即定位失败 | future.result() |
| as_completed | 大量任务，希望容错收集 | for f in as_completed(futures): |
| map | 管道式处理，要求整体一致性 | results = executor.map(...) |

python 复制代码

python
# 模式2：as_completed + 容错收集
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(risky_task, i) for i in range(10)]
    successes = []
    failures = []
    
    for future in as_completed(futures):
        try:
            result = future.result()
            successes.append(result)
        except Exception as e:
            failures.append(str(e))
    
    print(f"成功: {len(successes)}, 失败: {len(failures)}")

8.3 资源清理：shutdown和BrokenProcessPool

python 复制代码

python
# 正确的资源管理
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(task, i) for i in range(10)]
    # with块结束时自动调用shutdown(wait=True)
```

如果进程池中的子进程意外崩溃，后续提交会抛出`BrokenProcessPool`异常，此时需要重建执行器[reference:26]。

```python
from concurrent.futures import ProcessPoolExecutor, BrokenProcessPool

try:
    with ProcessPoolExecutor(max_workers=2) as executor:
        # 执行任务...
        pass
except BrokenProcessPool as e:
    print(f"进程池已损坏，需要重建: {e}")

九、性能实测

9.1 CPU密集型任务对比

python 复制代码

python
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def cpu_heavy(n):
    count = 0
    for i in range(n):
        count += i * i
    return count

def benchmark_cpu():
    n = 50_000_000
    workers = 4
    
    # 单线程
    start = time.time()
    [cpu_heavy(n) for _ in range(workers)]
    single = time.time() - start
    
    # 线程池（受GIL影响，基本无提升）
    start = time.time()
    with ThreadPoolExecutor(max_workers=workers) as ex:
        list(ex.map(cpu_heavy, [n] * workers))
    thread = time.time() - start
    
    # 进程池（真正并行）
    start = time.time()
    with ProcessPoolExecutor(max_workers=workers) as ex:
        list(ex.map(cpu_heavy, [n] * workers))
    process = time.time() - start
    
    print(f"单线程: {single:.2f}s")
    print(f"线程池: {thread:.2f}s")
    print(f"进程池: {process:.2f}s")

 # 结果：
 单线程: 3.2s
 线程池: 3.4s（甚至略慢，线程切换+锁竞争）
 进程池: 0.9s（接近4倍加速）

9.2 I/O密集型任务对比

python 复制代码

python
import time
import requests
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def io_task(delay):
    time.sleep(delay)  # 模拟网络I/O
    return delay

def benchmark_io():
    n_tasks = 50
    delay = 0.1
    workers = 20
    
    # 单线程
    start = time.time()
    [io_task(delay) for _ in range(n_tasks)]
    single = time.time() - start
    
    # 线程池
    start = time.time()
    with ThreadPoolExecutor(max_workers=workers) as ex:
        list(ex.map(io_task, [delay] * n_tasks))
    thread = time.time() - start
    
    # 进程池（启动开销大，略慢）
    start = time.time()
    with ProcessPoolExecutor(max_workers=workers) as ex:
        list(ex.map(io_task, [delay] * n_tasks))
    process = time.time() - start
    
    print(f"单线程: {single:.2f}s")
    print(f"线程池: {thread:.2f}s")
    print(f"进程池: {process:.2f}s")

  # 结果：
 单线程: 5.0s
 线程池: 0.35s
 进程池: 0.45s（进程创建开销大，稍慢）

十、常见问题与避坑指南

问1：线程池和进程池可以嵌套使用吗？

可以，但需注意：在线程池中使用ProcessPoolExecutor是可行的，在进程池中使用ThreadPoolExecutor也是可行的，但不要过度嵌套导致资源竞争。

问2：executor.map和as_completed该怎么选？

|-----------------|--------------|
| 场景 | 推荐 |
| 需要保持输入顺序 | map |
| 需要实时处理完成的任务 | as_completed |
| 需要容错（单个失败不影响其他） | as_completed |
| 简单的批量映射 | map |

问3：future.result()会阻塞，怎么避免阻塞主线程？

如果不想阻塞，可以使用回调函数：

python 复制代码

python
def callback(future):
    print(f"任务完成，结果: {future.result()}")

future = executor.submit(task, arg)
future.add_done_callback(callback)
# 主线程继续执行其他操作

问4：为什么我的线程池跑CPU任务没有加速？

因为GIL的存在，Python的CPU密集型任务在线程池中无法真正并行。改用ProcessPoolExecutor即可。

问5：为什么进程池执行时报AttributeError: Can't pickle local object？

进程池需要序列化参数和函数，确保：函数定义在模块顶层（不在函数内部）、参数类型是可pickle的（如基本类型、模块级类实例）、不使用lambda表达式。

十一、concurrent.futures vs 其他并发方案

|---------------------------|---------------|--------------|-----------------|
| 方案 | 适用场景 | 优势 | 劣势 |
| concurrent.futures | 中等并发，任务批量处理 | API统一简洁，自动管理 | 不适合海量短任务 |
| threading/multiprocessing | 需要精细控制 | 灵活性高 | 代码复杂，易出错 |
| asyncio | 超高并发I/O（万级连接） | 内存占用极低，性能优异 | 需要async/await改造 |
| Celery | 分布式任务队列 | 跨机器扩展 | 重，需额外基础设施 |

python 复制代码

python
# 简单场景：用concurrent.futures
with ThreadPoolExecutor() as executor:
    results = executor.map(func, data)

# 高并发I/O场景：用asyncio
async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        results = await asyncio.gather(*tasks)

十二、总结

核心要点

|----------|----------------------------------------------|
| 要点 | 总结 |
| 统一API | ThreadPoolExecutor和ProcessPoolExecutor用法完全一样 |
| 选型原则 | I/O密集型用线程池，CPU密集型用进程池 |
| Future对象 | 代表异步任务的状态、结果和异常 |
| 资源管理 | 用with语句自动管理，避免资源泄漏 |
| 异常处理 | 必须调用`result()`才能捕获任务异常 |