concurrent.futures 实战:进程池与线程池的统一抽象

GIL 限制了 Python 多线程的 CPU 并行能力,但不限制 I/O 并发。concurrent.futures 用一套统一的 API 管理线程池和进程池------选对池子,性能差距可达 10 倍。


为什么需要池:线程和进程的创建成本

单次创建线程或进程的开销不可忽视:

python 复制代码
import time
import threading
import multiprocessing

def dummy():
    pass

# 测量线程创建成本
start = time.perf_counter()
for _ in range(1000):
    t = threading.Thread(target=dummy)
    t.start()
    t.join()
print(f"1000 threads: {time.perf_counter() - start:.3f}s")  # 约 0.5~1.0s

# 测量进程创建成本
start = time.perf_counter()
for _ in range(100):
    p = multiprocessing.Process(target=dummy)
    p.start()
    p.join()
print(f"100 processes: {time.perf_counter() - start:.3f}s")  # 约 1.0~2.0s

线程池和进程池预创建一定数量的工作线程/进程,任务到来时直接分配,用完回收,避免反复创建销毁:

python 复制代码
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def task(n: int) -> int:
    return n * n

# 线程池:预创建 4 个线程
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(task, i) for i in range(100)]
    results = [f.result() for f in futures]

# 进程池:预创建 4 个进程
with ProcessPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(task, i) for i in range(100)]
    results = [f.result() for f in futures]

ThreadPoolExecutor vs ProcessPoolExecutor:选型决策

I/O 密集型

网络请求、文件读写、数据库查询
CPU 密集型

数值计算、图像处理、加解密
混合型
大量共享数据
数据独立
需要并行执行任务
任务类型?
ThreadPoolExecutor

GIL 在 I/O 时释放
ProcessPoolExecutor

绕过 GIL,真正的并行
是否有共享状态?
ThreadPoolExecutor

  • Lock 保护共享状态
    ProcessPoolExecutor

  • 序列化传递数据
    max_workers 经验值

= CPU 核心数 × 5
max_workers 经验值

= CPU 核心数

性能对比实验

python 复制代码
import time
import math
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# CPU 密集型任务:计算素数
def is_prime(n: int) -> bool:
    if n < 2:
        return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

def count_primes_in_range(start: int, end: int) -> int:
    return sum(1 for n in range(start, end) if is_prime(n))

def benchmark(executor_class, max_workers: int):
    """对比线程池和进程池在 CPU 密集型任务上的表现"""
    ranges = [
        (1, 50000),
        (50001, 100000),
        (100001, 150000),
        (150001, 200000),
    ]
    
    start = time.perf_counter()
    with executor_class(max_workers=max_workers) as executor:
        futures = [executor.submit(count_primes_in_range, s, e) for s, e in ranges]
        results = [f.result() for f in futures]
    
    elapsed = time.perf_counter() - start
    total = sum(results)
    return elapsed, total

# 对比
import os
cpu_count = os.cpu_count() or 4
print(f"CPU cores: {cpu_count}\n")

t_elapsed, t_total = benchmark(ThreadPoolExecutor, cpu_count)
p_elapsed, p_total = benchmark(ProcessPoolExecutor, cpu_count)

print(f"ThreadPool: {t_elapsed:.2f}s, found {t_total} primes")
print(f"ProcessPool: {p_elapsed:.2f}s, found {p_total} primes")
print(f"Speedup: {t_elapsed / p_elapsed:.1f}x")

典型输出(4 核机器):

复制代码
ThreadPool: 8.52s, found 17984 primes
ProcessPool: 2.41s, found 17984 primes
Speedup: 3.5x

Executor 的三种提交方式

方式一:submit + Future.result()

python 复制代码
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
import random

def fetch_url(url: str) -> str:
    """模拟网络请求"""
    delay = random.uniform(0.5, 2.0)
    time.sleep(delay)
    return f"Fetched {url} in {delay:.2f}s"

with ThreadPoolExecutor(max_workers=5) as executor:
    # submit 立即返回 Future,不阻塞
    future_to_url = {
        executor.submit(fetch_url, f"https://api.example.com/item/{i}"): i
        for i in range(10)
    }
    
    # 按完成顺序处理结果
    for future in as_completed(future_to_url):
        url_index = future_to_url[future]
        try:
            result = future.result()
            print(f"  [{url_index}] {result}")
        except Exception as e:
            print(f"  [{url_index}] Failed: {e}")

方式二:map ------ 保持顺序的批量提交

python 复制代码
from concurrent.futures import ThreadPoolExecutor

def process(n: int) -> int:
    return n * n

with ThreadPoolExecutor(max_workers=4) as executor:
    # map 返回迭代器,保持与输入相同的顺序
    results = executor.map(process, range(10))
    print(list(results))  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

map vs submit

维度 submit map
返回时机 立即返回 Future 惰性迭代器
结果顺序 通过 as_completed 获取完成顺序 保持输入顺序
异常处理 future.result() 抛出 迭代到对应元素时抛出
适用场景 需要按完成顺序处理、需要超时 批量操作、结果需保持顺序

方式三:submit + wait ------ 精细控制

python 复制代码
from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED, ALL_COMPLETED

with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(fetch_url, f"url-{i}"): i for i in range(20)}
    
    # 等待所有 Future 完成
    done, not_done = wait(futures, return_when=ALL_COMPLETED)
    print(f"All {len(done)} tasks completed")
    
    # 或等待第一个完成
    done, not_done = wait(futures, return_when=FIRST_COMPLETED)
    print(f"First completed: {done.pop().result()}")
    print(f"Still pending: {len(not_done)}")

waitreturn_when 选项

常量 行为
ALL_COMPLETED(默认) 等待所有 Future 完成或取消
FIRST_COMPLETED 等待任意一个 Future 完成
FIRST_EXCEPTION 等待任意一个 Future 抛出异常或所有完成

Future 对象的完整 API

python 复制代码
from concurrent.futures import ThreadPoolExecutor, TimeoutError

def slow_task(n: int) -> int:
    import time
    time.sleep(n)
    return n * 2

with ThreadPoolExecutor() as executor:
    future = executor.submit(slow_task, 5)
    
    # 状态检查
    print(f"Running: {future.running()}")    # True
    print(f"Done: {future.done()}")          # False
    print(f"Cancelled: {future.cancelled()}") # False
    
    # 取消(只能在未开始执行时取消)
    cancelled = future.cancel()
    print(f"Cancel attempt: {cancelled}")     # False(已经在执行)
    
    # 带超时的结果获取
    try:
        result = future.result(timeout=2)     # 2 秒超时
    except TimeoutError:
        print("Task timed out!")
    
    # 添加完成回调
    def on_done(f):
        print(f"Task completed: {f.result()}")
    
    future.add_done_callback(on_done)

asyncio 与 concurrent.futures 的桥梁

在 asyncio 应用中,无法避免地会调用同步函数(第三方库、CPU 计算)。run_in_executor 是异步世界和同步世界之间的桥梁:

python 复制代码
import asyncio
import time
import hashlib
from concurrent.futures import ProcessPoolExecutor

# CPU 密集型:hashlib 的 pbkdf2
def hash_password(password: str, salt: str) -> str:
    """同步的密码哈希------CPU 密集型"""
    return hashlib.pbkdf2_hmac(
        "sha256",
        password.encode(),
        salt.encode(),
        600_000,  # 迭代次数
    ).hex()

# 默认线程池
async def hash_with_thread_pool(password: str, salt: str) -> str:
    """在默认线程池中运行------但 hash 是 CPU 密集型,线程池效果差"""
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, hash_password, password, salt)

# 专用进程池
async def hash_with_process_pool(password: str, salt: str) -> str:
    """在专用进程池中运行------真正的并行"""
    loop = asyncio.get_event_loop()
    with ProcessPoolExecutor(max_workers=4) as pool:
        return await loop.run_in_executor(pool, hash_password, password, salt)

# Python 3.9+ 更简洁的语法
async def hash_with_to_thread(password: str, salt: str) -> str:
    """asyncio.to_thread ------ 默认线程池的语法糖"""
    return await asyncio.to_thread(hash_password, password, salt)


# 使用示例
async def main():
    passwords = [
        ("user_password_1", "salt_abc"),
        ("user_password_2", "salt_def"),
        ("user_password_3", "salt_ghi"),
    ]
    
    # 并发哈希------每个在独立的线程/进程中执行
    start = time.perf_counter()
    tasks = [hash_with_to_thread(pw, salt) for pw, salt in passwords]
    results = await asyncio.gather(*tasks)
    elapsed = time.perf_counter() - start
    
    print(f"Hashed {len(passwords)} passwords in {elapsed:.2f}s")
    for (pw, _), result in zip(passwords, results):
        print(f"  {pw} → {result[:16]}...")

asyncio.run(main())

run_in_executor 的原理

python 复制代码
# run_in_executor 的内部行为
async def run_in_executor_explained():
    loop = asyncio.get_event_loop()
    
    def blocking_task():
        time.sleep(2)  # 同步阻塞
        return "done"
    
    # 1. 将 blocking_task 提交给线程池
    # 2. 在事件循环中创建一个 Future
    # 3. 线程池完成后,通过 call_soon_threadsafe 将结果传回事件循环
    result = await loop.run_in_executor(None, blocking_task)
    
    # 等价于:
    # concurrent_future = executor.submit(blocking_task)
    # asyncio_future = asyncio.wrap_future(concurrent_future)
    # result = await asyncio_future

工程实战一:批量图片处理

python 复制代码
"""批量图片处理------进程池处理 CPU 密集的图像操作"""
import os
import time
from pathlib import Path
from concurrent.futures import ProcessPoolExecutor, as_completed, Future
from typing import NamedTuple

from PIL import Image


class ProcessResult(NamedTuple):
    path: str
    original_size: tuple[int, int]
    new_size: tuple[int, int]
    original_bytes: int
    new_bytes: int
    elapsed_ms: float


def process_single_image(
    image_path: str,
    output_dir: str,
    max_size: tuple[int, int] = (1200, 1200),
    quality: int = 85,
) -> ProcessResult:
    """处理单张图片:缩放、优化、转 JPEG"""
    start = time.perf_counter()
    original_bytes = os.path.getsize(image_path)

    with Image.open(image_path) as img:
        original_size = img.size

        # 缩放
        img.thumbnail(max_size, Image.Resampling.LANCZOS)
        new_size = img.size

        # 转换为 RGB(处理 PNG 透明通道)
        if img.mode in ("RGBA", "P"):
            img = img.convert("RGB")

        # 保存到输出目录
        output_path = Path(output_dir) / f"{Path(image_path).stem}.jpg"
        img.save(output_path, "JPEG", quality=quality, optimize=True)

    new_bytes = os.path.getsize(output_path)
    elapsed = (time.perf_counter() - start) * 1000

    return ProcessResult(
        path=str(output_path),
        original_size=original_size,
        new_size=new_size,
        original_bytes=original_bytes,
        new_bytes=new_bytes,
        elapsed_ms=elapsed,
    )


def batch_process_images(
    image_paths: list[str],
    output_dir: str,
    max_workers: int | None = None,
) -> dict[str, ProcessResult]:
    """批量处理图片------使用进程池"""
    max_workers = max_workers or os.cpu_count() or 4

    results: dict[str, ProcessResult] = {}
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        future_to_path: dict[Future[ProcessResult], str] = {
            executor.submit(process_single_image, path, output_dir): path
            for path in image_paths
        }

        for future in as_completed(future_to_path):
            path = future_to_path[future]
            try:
                result = future.result()
                results[path] = result
            except Exception as e:
                print(f"Failed to process {path}: {e}")

    return results


# 使用
if __name__ == "__main__":
    images = [str(p) for p in Path("photos").glob("*.jpg")]
    results = batch_process_images(images, "photos/processed")

    total_saved = sum(
        r.original_bytes - r.new_bytes for r in results.values()
    )
    print(f"\nProcessed {len(results)} images")
    print(f"Total space saved: {total_saved / (1024 * 1024):.1f} MB")

工程实战二:并发 API 聚合器

python 复制代码
"""并发 API 聚合器------线程池处理 I/O 密集的 HTTP 请求"""
import time
import json
from typing import Any
from dataclasses import dataclass, field
from concurrent.futures import ThreadPoolExecutor, Future

import httpx


@dataclass
class AggregatedResponse:
    """聚合响应------合并多个 API 的结果"""
    endpoint: str
    data: dict[str, Any]
    elapsed_ms: float
    errors: list[str] = field(default_factory=list)


class APIAggregator:
    """并发 API 聚合器------同时请求多个端点,合并响应"""

    def __init__(
        self,
        base_url: str,
        max_workers: int = 10,
        timeout: float = 30.0,
    ):
        self.base_url = base_url
        self.max_workers = max_workers
        self.timeout = timeout
        self._client: httpx.Client | None = None

    def __enter__(self):
        self._client = httpx.Client(
            base_url=self.base_url,
            timeout=self.timeout,
        )
        return self

    def __exit__(self, *args):
        if self._client:
            self._client.close()

    def fetch_endpoint(self, path: str) -> AggregatedResponse:
        """获取单个端点(在独立线程中运行)"""
        if not self._client:
            raise RuntimeError("Use as context manager")

        start = time.perf_counter()
        try:
            response = self._client.get(path)
            response.raise_for_status()
            data = response.json()
            errors = []
        except httpx.HTTPError as e:
            data = {}
            errors = [str(e)]

        elapsed = (time.perf_counter() - start) * 1000
        return AggregatedResponse(
            endpoint=path,
            data=data,
            elapsed_ms=elapsed,
            errors=errors,
        )

    def aggregate(self, endpoints: list[str]) -> dict[str, AggregatedResponse]:
        """并发请求所有端点并聚合"""
        results: dict[str, AggregatedResponse] = {}

        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            future_to_endpoint: dict[Future[AggregatedResponse], str] = {
                executor.submit(self.fetch_endpoint, ep): ep
                for ep in endpoints
            }

            for future in as_completed(future_to_endpoint):
                endpoint = future_to_endpoint[future]
                try:
                    results[endpoint] = future.result()
                except Exception as e:
                    results[endpoint] = AggregatedResponse(
                        endpoint=endpoint,
                        data={},
                        elapsed_ms=0,
                        errors=[str(e)],
                    )

        return results


# 使用示例
def main():
    endpoints = [
        "/users/1",
        "/users/2",
        "/posts/1",
        "/posts/2",
        "/comments/1",
        "/todos/1",
    ]

    with APIAggregator("https://jsonplaceholder.typicode.com") as agg:
        start = time.perf_counter()
        results = agg.aggregate(endpoints)
        total_time = (time.perf_counter() - start) * 1000

    # 报告
    print(f"\nAggregated {len(results)} endpoints in {total_time:.0f}ms")
    print(f"Longest single request: {max(r.elapsed_ms for r in results.values()):.0f}ms")
    
    error_count = sum(len(r.errors) for r in results.values())
    if error_count:
        print(f"Errors: {error_count}")
        for ep, r in results.items():
            if r.errors:
                print(f"  {ep}: {r.errors}")

max_workers 的科学设置

python 复制代码
import os

def optimal_workers(task_type: str) -> int:
    """根据任务类型推荐 max_workers"""
    cpu_count = os.cpu_count() or 4

    if task_type == "io":
        # I/O 密集型:大量线程,因为 GIL 在 I/O 时释放
        # 经验公式:CPU 核心数 × 5,但不超过任务数的上限
        return min(cpu_count * 5, 50)

    elif task_type == "cpu":
        # CPU 密集型:等于 CPU 核心数
        # 超过不会加速,反而增加上下文切换开销
        return cpu_count

    elif task_type == "mixed":
        # 混合型:1.5 × CPU 核心数作为折中
        return int(cpu_count * 1.5)

    else:
        return cpu_count

# Python 3.8+ 会自动设置合理的默认值
# ThreadPoolExecutor: min(32, (os.cpu_count() or 1) + 4)
# ProcessPoolExecutor: os.cpu_count() or 1

验证最佳 max_workers

python 复制代码
import time
from concurrent.futures import ThreadPoolExecutor

def verify_io_workers():
    """通过实验找到最佳 I/O 线程数"""
    
    def io_task(n: int) -> None:
        time.sleep(0.1)  # 模拟 I/O

    task_count = 200
    for workers in [1, 2, 4, 8, 16, 32, 64, 128]:
        start = time.perf_counter()
        with ThreadPoolExecutor(max_workers=workers) as executor:
            list(executor.map(io_task, range(task_count)))
        elapsed = time.perf_counter() - start
        print(f"  Workers: {workers:>3}  →  {elapsed:.2f}s  "
              f"({task_count / elapsed:.0f} tasks/s)")

verify_io_workers()

避坑指南

坑一:进程池中传递不可序列化的对象

python 复制代码
from concurrent.futures import ProcessPoolExecutor

# ❌ lambda 不可 pickle
with ProcessPoolExecutor() as executor:
    # executor.submit(lambda x: x * 2, 5)  # PicklingError

# ✅ 使用命名函数
def double(x):
    return x * 2

with ProcessPoolExecutor() as executor:
    future = executor.submit(double, 5)
    print(future.result())  # 10

坑二:进程池中修改全局变量无效

python 复制代码
# ❌ 进程池中修改的全局变量只在子进程内生效
counter = 0

def increment():
    global counter
    counter += 1
    return counter

with ProcessPoolExecutor() as executor:
    results = list(executor.map(increment, range(10)))
    # results = [1, 1, 1, ...]------每个进程都有自己的 counter 副本

print(counter)  # 0------主进程的 counter 没有被修改

坑三:忘了关闭 Executor

python 复制代码
# ❌ 不关闭 Executor 可能导致资源泄漏
executor = ThreadPoolExecutor(max_workers=4)
executor.submit(task)

# ✅ 使用上下文管理器自动关闭
with ThreadPoolExecutor(max_workers=4) as executor:
    executor.submit(task)
# 退出 with 块时自动调用 executor.shutdown(wait=True)

坑四:进程池开销大于任务收益

python 复制代码
from concurrent.futures import ProcessPoolExecutor
import time

def trivial_task(n: int) -> int:
    return n * 2

# ❌ 进程创建 + 序列化开销 > 任务执行时间
start = time.perf_counter()
with ProcessPoolExecutor() as executor:
    list(executor.map(trivial_task, range(1000)))
print(f"ProcessPool trivial: {time.perf_counter() - start:.3f}s")

# ✅ 简单任务用线程池或直接循环
start = time.perf_counter()
list(map(trivial_task, range(1000)))
print(f"Direct trivial: {time.perf_counter() - start:.3f}s")

concurrent.futures 完整 API 速查

API 功能 注意事项
executor.submit(fn, *args) 提交任务,返回 Future 立即返回,不阻塞
executor.map(fn, *iterables) 批量提交,返回迭代器 保持顺序,惰性求值
executor.shutdown(wait=True) 关闭 Executor 上下文管理器自动调用
future.result(timeout=None) 获取结果 阻塞直到完成或超时
future.exception(timeout=None) 获取异常(不抛出) 无异常返回 None
future.done() 是否完成 非阻塞
future.cancel() 取消任务 仅在未开始执行时有效
future.add_done_callback(fn) 注册完成回调 在 Executor 的线程中调用
as_completed(futures) 按完成顺序迭代 生成器
wait(futures, return_when) 等待一组 Future 返回 done/not_done 两个集合

并发模型全景

I/O 密集

网络、文件、数据库
CPU 密集

计算、处理、加解密
同步
异步 (async/await)
异步中调用同步
异步中调用 CPU 密集
并发/并行需求
任务类型
代码是同步

还是异步?
ProcessPoolExecutor

绕过 GIL 真并行
ThreadPoolExecutor

线程池 I/O 并发
asyncio + httpx/aiohttp

协程 I/O 并发
max_workers = CPU 核数
max_workers = CPU × 5
不需要池

事件循环调度
run_in_executor

asyncio.to_thread()
run_in_executor

  • ProcessPoolExecutor

如果这篇文章对掌握 Python 并发编程有帮助,点赞收藏让更多人看到!关注专栏,持续获取 Python 进阶干货。

相关推荐
不吃土豆的马铃薯3 小时前
Spdlog 进阶:日志基本控制、日志格式控制、异步记录器
linux·服务器·开发语言·前端·c++
疯狂成瘾者3 小时前
常见的 Linux 版本
linux·运维·服务器
水木流年追梦3 小时前
大模型入门-大模型的推理策略
开发语言·python·算法·正则表达式·prompt
xingyuzhisuan4 小时前
GPU服务器集群搭建指南——选型、部署、优化+避坑全解析
运维·服务器·人工智能·gpu算力
Cthy_hy4 小时前
Python 算法竞赛:数学核心知识点全总结
python·算法
独隅4 小时前
DeepSpeed ZeRO-3在TensorFlow中缺失的底层支持机制与优化全面指南
人工智能·python·tensorflow
山甫aa4 小时前
Java的包和import
java·开发语言
松☆4 小时前
昇腾NPU上的张量操作库,和PyTorch的张量操作有啥不一样?
人工智能·pytorch·python
tianrun12344 小时前
Ubuntu 24.04 安装 Fcitx5 + Rime + 搜狗词库(替代 IBus 与 Linux 搜狗输入法)
linux·运维·ubuntu