concurrent.futures 实战：进程池与线程池的统一抽象

GIL 限制了 Python 多线程的 CPU 并行能力，但不限制 I/O 并发。concurrent.futures 用一套统一的 API 管理线程池和进程池------选对池子，性能差距可达 10 倍。

为什么需要池：线程和进程的创建成本

单次创建线程或进程的开销不可忽视：

python 复制代码

import time
import threading
import multiprocessing

def dummy():
    pass

# 测量线程创建成本
start = time.perf_counter()
for _ in range(1000):
    t = threading.Thread(target=dummy)
    t.start()
    t.join()
print(f"1000 threads: {time.perf_counter() - start:.3f}s")  # 约 0.5~1.0s

# 测量进程创建成本
start = time.perf_counter()
for _ in range(100):
    p = multiprocessing.Process(target=dummy)
    p.start()
    p.join()
print(f"100 processes: {time.perf_counter() - start:.3f}s")  # 约 1.0~2.0s

线程池和进程池预创建一定数量的工作线程/进程，任务到来时直接分配，用完回收，避免反复创建销毁：

python 复制代码

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def task(n: int) -> int:
    return n * n

# 线程池：预创建 4 个线程
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(task, i) for i in range(100)]
    results = [f.result() for f in futures]

# 进程池：预创建 4 个进程
with ProcessPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(task, i) for i in range(100)]
    results = [f.result() for f in futures]

ThreadPoolExecutor vs ProcessPoolExecutor：选型决策

I/O 密集型

网络请求、文件读写、数据库查询
CPU 密集型

数值计算、图像处理、加解密
混合型
大量共享数据
数据独立
需要并行执行任务
任务类型？
ThreadPoolExecutor

GIL 在 I/O 时释放
ProcessPoolExecutor

绕过 GIL，真正的并行
是否有共享状态？
ThreadPoolExecutor

Lock 保护共享状态
ProcessPoolExecutor
序列化传递数据
max_workers 经验值

= CPU 核心数 × 5
max_workers 经验值

= CPU 核心数

性能对比实验

python 复制代码

import time
import math
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# CPU 密集型任务：计算素数
def is_prime(n: int) -> bool:
    if n < 2:
        return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

def count_primes_in_range(start: int, end: int) -> int:
    return sum(1 for n in range(start, end) if is_prime(n))

def benchmark(executor_class, max_workers: int):
    """对比线程池和进程池在 CPU 密集型任务上的表现"""
    ranges = [
        (1, 50000),
        (50001, 100000),
        (100001, 150000),
        (150001, 200000),
    ]
    
    start = time.perf_counter()
    with executor_class(max_workers=max_workers) as executor:
        futures = [executor.submit(count_primes_in_range, s, e) for s, e in ranges]
        results = [f.result() for f in futures]
    
    elapsed = time.perf_counter() - start
    total = sum(results)
    return elapsed, total

# 对比
import os
cpu_count = os.cpu_count() or 4
print(f"CPU cores: {cpu_count}\n")

t_elapsed, t_total = benchmark(ThreadPoolExecutor, cpu_count)
p_elapsed, p_total = benchmark(ProcessPoolExecutor, cpu_count)

print(f"ThreadPool: {t_elapsed:.2f}s, found {t_total} primes")
print(f"ProcessPool: {p_elapsed:.2f}s, found {p_total} primes")
print(f"Speedup: {t_elapsed / p_elapsed:.1f}x")

典型输出（4 核机器）：

复制代码

ThreadPool: 8.52s, found 17984 primes
ProcessPool: 2.41s, found 17984 primes
Speedup: 3.5x

Executor 的三种提交方式

方式一：`submit` + `Future.result()`

python 复制代码

from concurrent.futures import ThreadPoolExecutor, as_completed
import time
import random

def fetch_url(url: str) -> str:
    """模拟网络请求"""
    delay = random.uniform(0.5, 2.0)
    time.sleep(delay)
    return f"Fetched {url} in {delay:.2f}s"

with ThreadPoolExecutor(max_workers=5) as executor:
    # submit 立即返回 Future，不阻塞
    future_to_url = {
        executor.submit(fetch_url, f"https://api.example.com/item/{i}"): i
        for i in range(10)
    }
    
    # 按完成顺序处理结果
    for future in as_completed(future_to_url):
        url_index = future_to_url[future]
        try:
            result = future.result()
            print(f"  [{url_index}] {result}")
        except Exception as e:
            print(f"  [{url_index}] Failed: {e}")

方式二：`map` ------ 保持顺序的批量提交

python 复制代码

from concurrent.futures import ThreadPoolExecutor

def process(n: int) -> int:
    return n * n

with ThreadPoolExecutor(max_workers=4) as executor:
    # map 返回迭代器，保持与输入相同的顺序
    results = executor.map(process, range(10))
    print(list(results))  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

map vs submit：

维度	`submit`	`map`
返回时机	立即返回 `Future`	惰性迭代器
结果顺序	通过 `as_completed` 获取完成顺序	保持输入顺序
异常处理	`future.result()` 抛出	迭代到对应元素时抛出
适用场景	需要按完成顺序处理、需要超时	批量操作、结果需保持顺序

方式三：`submit` + `wait` ------ 精细控制

python 复制代码

from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED, ALL_COMPLETED

with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(fetch_url, f"url-{i}"): i for i in range(20)}
    
    # 等待所有 Future 完成
    done, not_done = wait(futures, return_when=ALL_COMPLETED)
    print(f"All {len(done)} tasks completed")
    
    # 或等待第一个完成
    done, not_done = wait(futures, return_when=FIRST_COMPLETED)
    print(f"First completed: {done.pop().result()}")
    print(f"Still pending: {len(not_done)}")

`wait` 的 `return_when` 选项

常量	行为
`ALL_COMPLETED`（默认）	等待所有 Future 完成或取消
`FIRST_COMPLETED`	等待任意一个 Future 完成
`FIRST_EXCEPTION`	等待任意一个 Future 抛出异常或所有完成

`Future` 对象的完整 API

python 复制代码

from concurrent.futures import ThreadPoolExecutor, TimeoutError

def slow_task(n: int) -> int:
    import time
    time.sleep(n)
    return n * 2

with ThreadPoolExecutor() as executor:
    future = executor.submit(slow_task, 5)
    
    # 状态检查
    print(f"Running: {future.running()}")    # True
    print(f"Done: {future.done()}")          # False
    print(f"Cancelled: {future.cancelled()}") # False
    
    # 取消（只能在未开始执行时取消）
    cancelled = future.cancel()
    print(f"Cancel attempt: {cancelled}")     # False（已经在执行）
    
    # 带超时的结果获取
    try:
        result = future.result(timeout=2)     # 2 秒超时
    except TimeoutError:
        print("Task timed out!")
    
    # 添加完成回调
    def on_done(f):
        print(f"Task completed: {f.result()}")
    
    future.add_done_callback(on_done)

asyncio 与 `concurrent.futures` 的桥梁

在 asyncio 应用中，无法避免地会调用同步函数（第三方库、CPU 计算）。run_in_executor 是异步世界和同步世界之间的桥梁：

python 复制代码

import asyncio
import time
import hashlib
from concurrent.futures import ProcessPoolExecutor

# CPU 密集型：hashlib 的 pbkdf2
def hash_password(password: str, salt: str) -> str:
    """同步的密码哈希------CPU 密集型"""
    return hashlib.pbkdf2_hmac(
        "sha256",
        password.encode(),
        salt.encode(),
        600_000,  # 迭代次数
    ).hex()

# 默认线程池
async def hash_with_thread_pool(password: str, salt: str) -> str:
    """在默认线程池中运行------但 hash 是 CPU 密集型，线程池效果差"""
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, hash_password, password, salt)

# 专用进程池
async def hash_with_process_pool(password: str, salt: str) -> str:
    """在专用进程池中运行------真正的并行"""
    loop = asyncio.get_event_loop()
    with ProcessPoolExecutor(max_workers=4) as pool:
        return await loop.run_in_executor(pool, hash_password, password, salt)

# Python 3.9+ 更简洁的语法
async def hash_with_to_thread(password: str, salt: str) -> str:
    """asyncio.to_thread ------ 默认线程池的语法糖"""
    return await asyncio.to_thread(hash_password, password, salt)


# 使用示例
async def main():
    passwords = [
        ("user_password_1", "salt_abc"),
        ("user_password_2", "salt_def"),
        ("user_password_3", "salt_ghi"),
    ]
    
    # 并发哈希------每个在独立的线程/进程中执行
    start = time.perf_counter()
    tasks = [hash_with_to_thread(pw, salt) for pw, salt in passwords]
    results = await asyncio.gather(*tasks)
    elapsed = time.perf_counter() - start
    
    print(f"Hashed {len(passwords)} passwords in {elapsed:.2f}s")
    for (pw, _), result in zip(passwords, results):
        print(f"  {pw} → {result[:16]}...")

asyncio.run(main())

`run_in_executor` 的原理

python 复制代码

# run_in_executor 的内部行为
async def run_in_executor_explained():
    loop = asyncio.get_event_loop()
    
    def blocking_task():
        time.sleep(2)  # 同步阻塞
        return "done"
    
    # 1. 将 blocking_task 提交给线程池
    # 2. 在事件循环中创建一个 Future
    # 3. 线程池完成后，通过 call_soon_threadsafe 将结果传回事件循环
    result = await loop.run_in_executor(None, blocking_task)
    
    # 等价于：
    # concurrent_future = executor.submit(blocking_task)
    # asyncio_future = asyncio.wrap_future(concurrent_future)
    # result = await asyncio_future

工程实战一：批量图片处理

python 复制代码

"""批量图片处理------进程池处理 CPU 密集的图像操作"""
import os
import time
from pathlib import Path
from concurrent.futures import ProcessPoolExecutor, as_completed, Future
from typing import NamedTuple

from PIL import Image


class ProcessResult(NamedTuple):
    path: str
    original_size: tuple[int, int]
    new_size: tuple[int, int]
    original_bytes: int
    new_bytes: int
    elapsed_ms: float


def process_single_image(
    image_path: str,
    output_dir: str,
    max_size: tuple[int, int] = (1200, 1200),
    quality: int = 85,
) -> ProcessResult:
    """处理单张图片：缩放、优化、转 JPEG"""
    start = time.perf_counter()
    original_bytes = os.path.getsize(image_path)

    with Image.open(image_path) as img:
        original_size = img.size

        # 缩放
        img.thumbnail(max_size, Image.Resampling.LANCZOS)
        new_size = img.size

        # 转换为 RGB（处理 PNG 透明通道）
        if img.mode in ("RGBA", "P"):
            img = img.convert("RGB")

        # 保存到输出目录
        output_path = Path(output_dir) / f"{Path(image_path).stem}.jpg"
        img.save(output_path, "JPEG", quality=quality, optimize=True)

    new_bytes = os.path.getsize(output_path)
    elapsed = (time.perf_counter() - start) * 1000

    return ProcessResult(
        path=str(output_path),
        original_size=original_size,
        new_size=new_size,
        original_bytes=original_bytes,
        new_bytes=new_bytes,
        elapsed_ms=elapsed,
    )


def batch_process_images(
    image_paths: list[str],
    output_dir: str,
    max_workers: int | None = None,
) -> dict[str, ProcessResult]:
    """批量处理图片------使用进程池"""
    max_workers = max_workers or os.cpu_count() or 4

    results: dict[str, ProcessResult] = {}
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        future_to_path: dict[Future[ProcessResult], str] = {
            executor.submit(process_single_image, path, output_dir): path
            for path in image_paths
        }

        for future in as_completed(future_to_path):
            path = future_to_path[future]
            try:
                result = future.result()
                results[path] = result
            except Exception as e:
                print(f"Failed to process {path}: {e}")

    return results


# 使用
if __name__ == "__main__":
    images = [str(p) for p in Path("photos").glob("*.jpg")]
    results = batch_process_images(images, "photos/processed")

    total_saved = sum(
        r.original_bytes - r.new_bytes for r in results.values()
    )
    print(f"\nProcessed {len(results)} images")
    print(f"Total space saved: {total_saved / (1024 * 1024):.1f} MB")

工程实战二：并发 API 聚合器

python 复制代码

"""并发 API 聚合器------线程池处理 I/O 密集的 HTTP 请求"""
import time
import json
from typing import Any
from dataclasses import dataclass, field
from concurrent.futures import ThreadPoolExecutor, Future

import httpx


@dataclass
class AggregatedResponse:
    """聚合响应------合并多个 API 的结果"""
    endpoint: str
    data: dict[str, Any]
    elapsed_ms: float
    errors: list[str] = field(default_factory=list)


class APIAggregator:
    """并发 API 聚合器------同时请求多个端点，合并响应"""

    def __init__(
        self,
        base_url: str,
        max_workers: int = 10,
        timeout: float = 30.0,
    ):
        self.base_url = base_url
        self.max_workers = max_workers
        self.timeout = timeout
        self._client: httpx.Client | None = None

    def __enter__(self):
        self._client = httpx.Client(
            base_url=self.base_url,
            timeout=self.timeout,
        )
        return self

    def __exit__(self, *args):
        if self._client:
            self._client.close()

    def fetch_endpoint(self, path: str) -> AggregatedResponse:
        """获取单个端点（在独立线程中运行）"""
        if not self._client:
            raise RuntimeError("Use as context manager")

        start = time.perf_counter()
        try:
            response = self._client.get(path)
            response.raise_for_status()
            data = response.json()
            errors = []
        except httpx.HTTPError as e:
            data = {}
            errors = [str(e)]

        elapsed = (time.perf_counter() - start) * 1000
        return AggregatedResponse(
            endpoint=path,
            data=data,
            elapsed_ms=elapsed,
            errors=errors,
        )

    def aggregate(self, endpoints: list[str]) -> dict[str, AggregatedResponse]:
        """并发请求所有端点并聚合"""
        results: dict[str, AggregatedResponse] = {}

        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            future_to_endpoint: dict[Future[AggregatedResponse], str] = {
                executor.submit(self.fetch_endpoint, ep): ep
                for ep in endpoints
            }

            for future in as_completed(future_to_endpoint):
                endpoint = future_to_endpoint[future]
                try:
                    results[endpoint] = future.result()
                except Exception as e:
                    results[endpoint] = AggregatedResponse(
                        endpoint=endpoint,
                        data={},
                        elapsed_ms=0,
                        errors=[str(e)],
                    )

        return results


# 使用示例
def main():
    endpoints = [
        "/users/1",
        "/users/2",
        "/posts/1",
        "/posts/2",
        "/comments/1",
        "/todos/1",
    ]

    with APIAggregator("https://jsonplaceholder.typicode.com") as agg:
        start = time.perf_counter()
        results = agg.aggregate(endpoints)
        total_time = (time.perf_counter() - start) * 1000

    # 报告
    print(f"\nAggregated {len(results)} endpoints in {total_time:.0f}ms")
    print(f"Longest single request: {max(r.elapsed_ms for r in results.values()):.0f}ms")
    
    error_count = sum(len(r.errors) for r in results.values())
    if error_count:
        print(f"Errors: {error_count}")
        for ep, r in results.items():
            if r.errors:
                print(f"  {ep}: {r.errors}")

`max_workers` 的科学设置

python 复制代码

import os

def optimal_workers(task_type: str) -> int:
    """根据任务类型推荐 max_workers"""
    cpu_count = os.cpu_count() or 4

    if task_type == "io":
        # I/O 密集型：大量线程，因为 GIL 在 I/O 时释放
        # 经验公式：CPU 核心数 × 5，但不超过任务数的上限
        return min(cpu_count * 5, 50)

    elif task_type == "cpu":
        # CPU 密集型：等于 CPU 核心数
        # 超过不会加速，反而增加上下文切换开销
        return cpu_count

    elif task_type == "mixed":
        # 混合型：1.5 × CPU 核心数作为折中
        return int(cpu_count * 1.5)

    else:
        return cpu_count

# Python 3.8+ 会自动设置合理的默认值
# ThreadPoolExecutor: min(32, (os.cpu_count() or 1) + 4)
# ProcessPoolExecutor: os.cpu_count() or 1

验证最佳 `max_workers`

python 复制代码

import time
from concurrent.futures import ThreadPoolExecutor

def verify_io_workers():
    """通过实验找到最佳 I/O 线程数"""
    
    def io_task(n: int) -> None:
        time.sleep(0.1)  # 模拟 I/O

    task_count = 200
    for workers in [1, 2, 4, 8, 16, 32, 64, 128]:
        start = time.perf_counter()
        with ThreadPoolExecutor(max_workers=workers) as executor:
            list(executor.map(io_task, range(task_count)))
        elapsed = time.perf_counter() - start
        print(f"  Workers: {workers:>3}  →  {elapsed:.2f}s  "
              f"({task_count / elapsed:.0f} tasks/s)")

verify_io_workers()

避坑指南

坑一：进程池中传递不可序列化的对象

python 复制代码

from concurrent.futures import ProcessPoolExecutor

# ❌ lambda 不可 pickle
with ProcessPoolExecutor() as executor:
    # executor.submit(lambda x: x * 2, 5)  # PicklingError

# ✅ 使用命名函数
def double(x):
    return x * 2

with ProcessPoolExecutor() as executor:
    future = executor.submit(double, 5)
    print(future.result())  # 10

坑二：进程池中修改全局变量无效

python 复制代码

# ❌ 进程池中修改的全局变量只在子进程内生效
counter = 0

def increment():
    global counter
    counter += 1
    return counter

with ProcessPoolExecutor() as executor:
    results = list(executor.map(increment, range(10)))
    # results = [1, 1, 1, ...]------每个进程都有自己的 counter 副本

print(counter)  # 0------主进程的 counter 没有被修改

坑三：忘了关闭 Executor

python 复制代码

# ❌ 不关闭 Executor 可能导致资源泄漏
executor = ThreadPoolExecutor(max_workers=4)
executor.submit(task)

# ✅ 使用上下文管理器自动关闭
with ThreadPoolExecutor(max_workers=4) as executor:
    executor.submit(task)
# 退出 with 块时自动调用 executor.shutdown(wait=True)

坑四：进程池开销大于任务收益

python 复制代码

from concurrent.futures import ProcessPoolExecutor
import time

def trivial_task(n: int) -> int:
    return n * 2

# ❌ 进程创建 + 序列化开销 > 任务执行时间
start = time.perf_counter()
with ProcessPoolExecutor() as executor:
    list(executor.map(trivial_task, range(1000)))
print(f"ProcessPool trivial: {time.perf_counter() - start:.3f}s")

# ✅ 简单任务用线程池或直接循环
start = time.perf_counter()
list(map(trivial_task, range(1000)))
print(f"Direct trivial: {time.perf_counter() - start:.3f}s")

`concurrent.futures` 完整 API 速查

API	功能	注意事项
`executor.submit(fn, *args)`	提交任务，返回 Future	立即返回，不阻塞
`executor.map(fn, *iterables)`	批量提交，返回迭代器	保持顺序，惰性求值
`executor.shutdown(wait=True)`	关闭 Executor	上下文管理器自动调用
`future.result(timeout=None)`	获取结果	阻塞直到完成或超时
`future.exception(timeout=None)`	获取异常（不抛出）	无异常返回 None
`future.done()`	是否完成	非阻塞
`future.cancel()`	取消任务	仅在未开始执行时有效
`future.add_done_callback(fn)`	注册完成回调	在 Executor 的线程中调用
`as_completed(futures)`	按完成顺序迭代	生成器
`wait(futures, return_when)`	等待一组 Future	返回 done/not_done 两个集合

并发模型全景

I/O 密集

网络、文件、数据库
CPU 密集

计算、处理、加解密
同步
异步 (async/await)
异步中调用同步
异步中调用 CPU 密集
并发/并行需求
任务类型
代码是同步

还是异步？
ProcessPoolExecutor

绕过 GIL 真并行
ThreadPoolExecutor

线程池 I/O 并发
asyncio + httpx/aiohttp

协程 I/O 并发
max_workers = CPU 核数
max_workers = CPU × 5
不需要池

事件循环调度
run_in_executor

asyncio.to_thread()
run_in_executor

ProcessPoolExecutor

如果这篇文章对掌握 Python 并发编程有帮助，点赞收藏让更多人看到！关注专栏，持续获取 Python 进阶干货。

concurrent.futures 实战：进程池与线程池的统一抽象

为什么需要池：线程和进程的创建成本

ThreadPoolExecutor vs ProcessPoolExecutor：选型决策

性能对比实验

Executor 的三种提交方式

方式一：submit + Future.result()

方式二：map ------ 保持顺序的批量提交

方式三：submit + wait ------ 精细控制

wait 的 return_when 选项

Future 对象的完整 API

asyncio 与 concurrent.futures 的桥梁

run_in_executor 的原理

工程实战一：批量图片处理

工程实战二：并发 API 聚合器

max_workers 的科学设置

验证最佳 max_workers

避坑指南

坑一：进程池中传递不可序列化的对象

坑二：进程池中修改全局变量无效

坑三：忘了关闭 Executor

坑四：进程池开销大于任务收益

concurrent.futures 完整 API 速查

并发模型全景

方式一：`submit` + `Future.result()`

方式二：`map` ------ 保持顺序的批量提交

方式三：`submit` + `wait` ------ 精细控制

`wait` 的 `return_when` 选项

`Future` 对象的完整 API

asyncio 与 `concurrent.futures` 的桥梁

`run_in_executor` 的原理

`max_workers` 的科学设置

验证最佳 `max_workers`

`concurrent.futures` 完整 API 速查