GIL 限制了 Python 多线程的 CPU 并行能力,但不限制 I/O 并发。
concurrent.futures用一套统一的 API 管理线程池和进程池------选对池子,性能差距可达 10 倍。
为什么需要池:线程和进程的创建成本
单次创建线程或进程的开销不可忽视:
python
import time
import threading
import multiprocessing
def dummy():
pass
# 测量线程创建成本
start = time.perf_counter()
for _ in range(1000):
t = threading.Thread(target=dummy)
t.start()
t.join()
print(f"1000 threads: {time.perf_counter() - start:.3f}s") # 约 0.5~1.0s
# 测量进程创建成本
start = time.perf_counter()
for _ in range(100):
p = multiprocessing.Process(target=dummy)
p.start()
p.join()
print(f"100 processes: {time.perf_counter() - start:.3f}s") # 约 1.0~2.0s
线程池和进程池预创建一定数量的工作线程/进程,任务到来时直接分配,用完回收,避免反复创建销毁:
python
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def task(n: int) -> int:
return n * n
# 线程池:预创建 4 个线程
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(task, i) for i in range(100)]
results = [f.result() for f in futures]
# 进程池:预创建 4 个进程
with ProcessPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(task, i) for i in range(100)]
results = [f.result() for f in futures]
ThreadPoolExecutor vs ProcessPoolExecutor:选型决策
I/O 密集型
网络请求、文件读写、数据库查询
CPU 密集型
数值计算、图像处理、加解密
混合型
大量共享数据
数据独立
需要并行执行任务
任务类型?
ThreadPoolExecutor
GIL 在 I/O 时释放
ProcessPoolExecutor
绕过 GIL,真正的并行
是否有共享状态?
ThreadPoolExecutor
-
Lock 保护共享状态
ProcessPoolExecutor -
序列化传递数据
max_workers 经验值
= CPU 核心数 × 5
max_workers 经验值
= CPU 核心数
性能对比实验
python
import time
import math
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
# CPU 密集型任务:计算素数
def is_prime(n: int) -> bool:
if n < 2:
return False
for i in range(2, int(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True
def count_primes_in_range(start: int, end: int) -> int:
return sum(1 for n in range(start, end) if is_prime(n))
def benchmark(executor_class, max_workers: int):
"""对比线程池和进程池在 CPU 密集型任务上的表现"""
ranges = [
(1, 50000),
(50001, 100000),
(100001, 150000),
(150001, 200000),
]
start = time.perf_counter()
with executor_class(max_workers=max_workers) as executor:
futures = [executor.submit(count_primes_in_range, s, e) for s, e in ranges]
results = [f.result() for f in futures]
elapsed = time.perf_counter() - start
total = sum(results)
return elapsed, total
# 对比
import os
cpu_count = os.cpu_count() or 4
print(f"CPU cores: {cpu_count}\n")
t_elapsed, t_total = benchmark(ThreadPoolExecutor, cpu_count)
p_elapsed, p_total = benchmark(ProcessPoolExecutor, cpu_count)
print(f"ThreadPool: {t_elapsed:.2f}s, found {t_total} primes")
print(f"ProcessPool: {p_elapsed:.2f}s, found {p_total} primes")
print(f"Speedup: {t_elapsed / p_elapsed:.1f}x")
典型输出(4 核机器):
ThreadPool: 8.52s, found 17984 primes
ProcessPool: 2.41s, found 17984 primes
Speedup: 3.5x
Executor 的三种提交方式
方式一:submit + Future.result()
python
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
import random
def fetch_url(url: str) -> str:
"""模拟网络请求"""
delay = random.uniform(0.5, 2.0)
time.sleep(delay)
return f"Fetched {url} in {delay:.2f}s"
with ThreadPoolExecutor(max_workers=5) as executor:
# submit 立即返回 Future,不阻塞
future_to_url = {
executor.submit(fetch_url, f"https://api.example.com/item/{i}"): i
for i in range(10)
}
# 按完成顺序处理结果
for future in as_completed(future_to_url):
url_index = future_to_url[future]
try:
result = future.result()
print(f" [{url_index}] {result}")
except Exception as e:
print(f" [{url_index}] Failed: {e}")
方式二:map ------ 保持顺序的批量提交
python
from concurrent.futures import ThreadPoolExecutor
def process(n: int) -> int:
return n * n
with ThreadPoolExecutor(max_workers=4) as executor:
# map 返回迭代器,保持与输入相同的顺序
results = executor.map(process, range(10))
print(list(results)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
map vs submit:
| 维度 | submit |
map |
|---|---|---|
| 返回时机 | 立即返回 Future |
惰性迭代器 |
| 结果顺序 | 通过 as_completed 获取完成顺序 |
保持输入顺序 |
| 异常处理 | future.result() 抛出 |
迭代到对应元素时抛出 |
| 适用场景 | 需要按完成顺序处理、需要超时 | 批量操作、结果需保持顺序 |
方式三:submit + wait ------ 精细控制
python
from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED, ALL_COMPLETED
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(fetch_url, f"url-{i}"): i for i in range(20)}
# 等待所有 Future 完成
done, not_done = wait(futures, return_when=ALL_COMPLETED)
print(f"All {len(done)} tasks completed")
# 或等待第一个完成
done, not_done = wait(futures, return_when=FIRST_COMPLETED)
print(f"First completed: {done.pop().result()}")
print(f"Still pending: {len(not_done)}")
wait 的 return_when 选项
| 常量 | 行为 |
|---|---|
ALL_COMPLETED(默认) |
等待所有 Future 完成或取消 |
FIRST_COMPLETED |
等待任意一个 Future 完成 |
FIRST_EXCEPTION |
等待任意一个 Future 抛出异常或所有完成 |
Future 对象的完整 API
python
from concurrent.futures import ThreadPoolExecutor, TimeoutError
def slow_task(n: int) -> int:
import time
time.sleep(n)
return n * 2
with ThreadPoolExecutor() as executor:
future = executor.submit(slow_task, 5)
# 状态检查
print(f"Running: {future.running()}") # True
print(f"Done: {future.done()}") # False
print(f"Cancelled: {future.cancelled()}") # False
# 取消(只能在未开始执行时取消)
cancelled = future.cancel()
print(f"Cancel attempt: {cancelled}") # False(已经在执行)
# 带超时的结果获取
try:
result = future.result(timeout=2) # 2 秒超时
except TimeoutError:
print("Task timed out!")
# 添加完成回调
def on_done(f):
print(f"Task completed: {f.result()}")
future.add_done_callback(on_done)
asyncio 与 concurrent.futures 的桥梁
在 asyncio 应用中,无法避免地会调用同步函数(第三方库、CPU 计算)。run_in_executor 是异步世界和同步世界之间的桥梁:
python
import asyncio
import time
import hashlib
from concurrent.futures import ProcessPoolExecutor
# CPU 密集型:hashlib 的 pbkdf2
def hash_password(password: str, salt: str) -> str:
"""同步的密码哈希------CPU 密集型"""
return hashlib.pbkdf2_hmac(
"sha256",
password.encode(),
salt.encode(),
600_000, # 迭代次数
).hex()
# 默认线程池
async def hash_with_thread_pool(password: str, salt: str) -> str:
"""在默认线程池中运行------但 hash 是 CPU 密集型,线程池效果差"""
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, hash_password, password, salt)
# 专用进程池
async def hash_with_process_pool(password: str, salt: str) -> str:
"""在专用进程池中运行------真正的并行"""
loop = asyncio.get_event_loop()
with ProcessPoolExecutor(max_workers=4) as pool:
return await loop.run_in_executor(pool, hash_password, password, salt)
# Python 3.9+ 更简洁的语法
async def hash_with_to_thread(password: str, salt: str) -> str:
"""asyncio.to_thread ------ 默认线程池的语法糖"""
return await asyncio.to_thread(hash_password, password, salt)
# 使用示例
async def main():
passwords = [
("user_password_1", "salt_abc"),
("user_password_2", "salt_def"),
("user_password_3", "salt_ghi"),
]
# 并发哈希------每个在独立的线程/进程中执行
start = time.perf_counter()
tasks = [hash_with_to_thread(pw, salt) for pw, salt in passwords]
results = await asyncio.gather(*tasks)
elapsed = time.perf_counter() - start
print(f"Hashed {len(passwords)} passwords in {elapsed:.2f}s")
for (pw, _), result in zip(passwords, results):
print(f" {pw} → {result[:16]}...")
asyncio.run(main())
run_in_executor 的原理
python
# run_in_executor 的内部行为
async def run_in_executor_explained():
loop = asyncio.get_event_loop()
def blocking_task():
time.sleep(2) # 同步阻塞
return "done"
# 1. 将 blocking_task 提交给线程池
# 2. 在事件循环中创建一个 Future
# 3. 线程池完成后,通过 call_soon_threadsafe 将结果传回事件循环
result = await loop.run_in_executor(None, blocking_task)
# 等价于:
# concurrent_future = executor.submit(blocking_task)
# asyncio_future = asyncio.wrap_future(concurrent_future)
# result = await asyncio_future
工程实战一:批量图片处理
python
"""批量图片处理------进程池处理 CPU 密集的图像操作"""
import os
import time
from pathlib import Path
from concurrent.futures import ProcessPoolExecutor, as_completed, Future
from typing import NamedTuple
from PIL import Image
class ProcessResult(NamedTuple):
path: str
original_size: tuple[int, int]
new_size: tuple[int, int]
original_bytes: int
new_bytes: int
elapsed_ms: float
def process_single_image(
image_path: str,
output_dir: str,
max_size: tuple[int, int] = (1200, 1200),
quality: int = 85,
) -> ProcessResult:
"""处理单张图片:缩放、优化、转 JPEG"""
start = time.perf_counter()
original_bytes = os.path.getsize(image_path)
with Image.open(image_path) as img:
original_size = img.size
# 缩放
img.thumbnail(max_size, Image.Resampling.LANCZOS)
new_size = img.size
# 转换为 RGB(处理 PNG 透明通道)
if img.mode in ("RGBA", "P"):
img = img.convert("RGB")
# 保存到输出目录
output_path = Path(output_dir) / f"{Path(image_path).stem}.jpg"
img.save(output_path, "JPEG", quality=quality, optimize=True)
new_bytes = os.path.getsize(output_path)
elapsed = (time.perf_counter() - start) * 1000
return ProcessResult(
path=str(output_path),
original_size=original_size,
new_size=new_size,
original_bytes=original_bytes,
new_bytes=new_bytes,
elapsed_ms=elapsed,
)
def batch_process_images(
image_paths: list[str],
output_dir: str,
max_workers: int | None = None,
) -> dict[str, ProcessResult]:
"""批量处理图片------使用进程池"""
max_workers = max_workers or os.cpu_count() or 4
results: dict[str, ProcessResult] = {}
with ProcessPoolExecutor(max_workers=max_workers) as executor:
future_to_path: dict[Future[ProcessResult], str] = {
executor.submit(process_single_image, path, output_dir): path
for path in image_paths
}
for future in as_completed(future_to_path):
path = future_to_path[future]
try:
result = future.result()
results[path] = result
except Exception as e:
print(f"Failed to process {path}: {e}")
return results
# 使用
if __name__ == "__main__":
images = [str(p) for p in Path("photos").glob("*.jpg")]
results = batch_process_images(images, "photos/processed")
total_saved = sum(
r.original_bytes - r.new_bytes for r in results.values()
)
print(f"\nProcessed {len(results)} images")
print(f"Total space saved: {total_saved / (1024 * 1024):.1f} MB")
工程实战二:并发 API 聚合器
python
"""并发 API 聚合器------线程池处理 I/O 密集的 HTTP 请求"""
import time
import json
from typing import Any
from dataclasses import dataclass, field
from concurrent.futures import ThreadPoolExecutor, Future
import httpx
@dataclass
class AggregatedResponse:
"""聚合响应------合并多个 API 的结果"""
endpoint: str
data: dict[str, Any]
elapsed_ms: float
errors: list[str] = field(default_factory=list)
class APIAggregator:
"""并发 API 聚合器------同时请求多个端点,合并响应"""
def __init__(
self,
base_url: str,
max_workers: int = 10,
timeout: float = 30.0,
):
self.base_url = base_url
self.max_workers = max_workers
self.timeout = timeout
self._client: httpx.Client | None = None
def __enter__(self):
self._client = httpx.Client(
base_url=self.base_url,
timeout=self.timeout,
)
return self
def __exit__(self, *args):
if self._client:
self._client.close()
def fetch_endpoint(self, path: str) -> AggregatedResponse:
"""获取单个端点(在独立线程中运行)"""
if not self._client:
raise RuntimeError("Use as context manager")
start = time.perf_counter()
try:
response = self._client.get(path)
response.raise_for_status()
data = response.json()
errors = []
except httpx.HTTPError as e:
data = {}
errors = [str(e)]
elapsed = (time.perf_counter() - start) * 1000
return AggregatedResponse(
endpoint=path,
data=data,
elapsed_ms=elapsed,
errors=errors,
)
def aggregate(self, endpoints: list[str]) -> dict[str, AggregatedResponse]:
"""并发请求所有端点并聚合"""
results: dict[str, AggregatedResponse] = {}
with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
future_to_endpoint: dict[Future[AggregatedResponse], str] = {
executor.submit(self.fetch_endpoint, ep): ep
for ep in endpoints
}
for future in as_completed(future_to_endpoint):
endpoint = future_to_endpoint[future]
try:
results[endpoint] = future.result()
except Exception as e:
results[endpoint] = AggregatedResponse(
endpoint=endpoint,
data={},
elapsed_ms=0,
errors=[str(e)],
)
return results
# 使用示例
def main():
endpoints = [
"/users/1",
"/users/2",
"/posts/1",
"/posts/2",
"/comments/1",
"/todos/1",
]
with APIAggregator("https://jsonplaceholder.typicode.com") as agg:
start = time.perf_counter()
results = agg.aggregate(endpoints)
total_time = (time.perf_counter() - start) * 1000
# 报告
print(f"\nAggregated {len(results)} endpoints in {total_time:.0f}ms")
print(f"Longest single request: {max(r.elapsed_ms for r in results.values()):.0f}ms")
error_count = sum(len(r.errors) for r in results.values())
if error_count:
print(f"Errors: {error_count}")
for ep, r in results.items():
if r.errors:
print(f" {ep}: {r.errors}")
max_workers 的科学设置
python
import os
def optimal_workers(task_type: str) -> int:
"""根据任务类型推荐 max_workers"""
cpu_count = os.cpu_count() or 4
if task_type == "io":
# I/O 密集型:大量线程,因为 GIL 在 I/O 时释放
# 经验公式:CPU 核心数 × 5,但不超过任务数的上限
return min(cpu_count * 5, 50)
elif task_type == "cpu":
# CPU 密集型:等于 CPU 核心数
# 超过不会加速,反而增加上下文切换开销
return cpu_count
elif task_type == "mixed":
# 混合型:1.5 × CPU 核心数作为折中
return int(cpu_count * 1.5)
else:
return cpu_count
# Python 3.8+ 会自动设置合理的默认值
# ThreadPoolExecutor: min(32, (os.cpu_count() or 1) + 4)
# ProcessPoolExecutor: os.cpu_count() or 1
验证最佳 max_workers
python
import time
from concurrent.futures import ThreadPoolExecutor
def verify_io_workers():
"""通过实验找到最佳 I/O 线程数"""
def io_task(n: int) -> None:
time.sleep(0.1) # 模拟 I/O
task_count = 200
for workers in [1, 2, 4, 8, 16, 32, 64, 128]:
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=workers) as executor:
list(executor.map(io_task, range(task_count)))
elapsed = time.perf_counter() - start
print(f" Workers: {workers:>3} → {elapsed:.2f}s "
f"({task_count / elapsed:.0f} tasks/s)")
verify_io_workers()
避坑指南
坑一:进程池中传递不可序列化的对象
python
from concurrent.futures import ProcessPoolExecutor
# ❌ lambda 不可 pickle
with ProcessPoolExecutor() as executor:
# executor.submit(lambda x: x * 2, 5) # PicklingError
# ✅ 使用命名函数
def double(x):
return x * 2
with ProcessPoolExecutor() as executor:
future = executor.submit(double, 5)
print(future.result()) # 10
坑二:进程池中修改全局变量无效
python
# ❌ 进程池中修改的全局变量只在子进程内生效
counter = 0
def increment():
global counter
counter += 1
return counter
with ProcessPoolExecutor() as executor:
results = list(executor.map(increment, range(10)))
# results = [1, 1, 1, ...]------每个进程都有自己的 counter 副本
print(counter) # 0------主进程的 counter 没有被修改
坑三:忘了关闭 Executor
python
# ❌ 不关闭 Executor 可能导致资源泄漏
executor = ThreadPoolExecutor(max_workers=4)
executor.submit(task)
# ✅ 使用上下文管理器自动关闭
with ThreadPoolExecutor(max_workers=4) as executor:
executor.submit(task)
# 退出 with 块时自动调用 executor.shutdown(wait=True)
坑四:进程池开销大于任务收益
python
from concurrent.futures import ProcessPoolExecutor
import time
def trivial_task(n: int) -> int:
return n * 2
# ❌ 进程创建 + 序列化开销 > 任务执行时间
start = time.perf_counter()
with ProcessPoolExecutor() as executor:
list(executor.map(trivial_task, range(1000)))
print(f"ProcessPool trivial: {time.perf_counter() - start:.3f}s")
# ✅ 简单任务用线程池或直接循环
start = time.perf_counter()
list(map(trivial_task, range(1000)))
print(f"Direct trivial: {time.perf_counter() - start:.3f}s")
concurrent.futures 完整 API 速查
| API | 功能 | 注意事项 |
|---|---|---|
executor.submit(fn, *args) |
提交任务,返回 Future | 立即返回,不阻塞 |
executor.map(fn, *iterables) |
批量提交,返回迭代器 | 保持顺序,惰性求值 |
executor.shutdown(wait=True) |
关闭 Executor | 上下文管理器自动调用 |
future.result(timeout=None) |
获取结果 | 阻塞直到完成或超时 |
future.exception(timeout=None) |
获取异常(不抛出) | 无异常返回 None |
future.done() |
是否完成 | 非阻塞 |
future.cancel() |
取消任务 | 仅在未开始执行时有效 |
future.add_done_callback(fn) |
注册完成回调 | 在 Executor 的线程中调用 |
as_completed(futures) |
按完成顺序迭代 | 生成器 |
wait(futures, return_when) |
等待一组 Future | 返回 done/not_done 两个集合 |
并发模型全景
I/O 密集
网络、文件、数据库
CPU 密集
计算、处理、加解密
同步
异步 (async/await)
异步中调用同步
异步中调用 CPU 密集
并发/并行需求
任务类型
代码是同步
还是异步?
ProcessPoolExecutor
绕过 GIL 真并行
ThreadPoolExecutor
线程池 I/O 并发
asyncio + httpx/aiohttp
协程 I/O 并发
max_workers = CPU 核数
max_workers = CPU × 5
不需要池
事件循环调度
run_in_executor
asyncio.to_thread()
run_in_executor
- ProcessPoolExecutor
如果这篇文章对掌握 Python 并发编程有帮助,点赞收藏让更多人看到!关注专栏,持续获取 Python 进阶干货。