【流畅的Python】第20章：并发执行器

【流畅的Python】第20章：并发执行器 --- 学习笔记

抨击线程的往往是系统开发者，他们考虑的使用场景对一般的应用程序开发者来说，也许一生都不会遇到。应用程序开发者遇到的使用场景，99% 的情况下只需知道如何派生一堆独立的线程，然后用队列收集结果。

------ Michele Simionato

章节概述
并发网络下载：三种方案对比
- 依序下载的脚本
- [使用 concurrent.futures 并发下载](#使用 concurrent.futures 并发下载)
- [Future 对象详解](#Future 对象详解)
[使用 concurrent.futures 启动进程](#使用 concurrent.futures 启动进程)
- [ProcessPoolExecutor 的使用](#ProcessPoolExecutor 的使用)
- [ThreadPoolExecutor vs ProcessPoolExecutor](#ThreadPoolExecutor vs ProcessPoolExecutor)
[实验 Executor.map 方法](#实验 Executor.map 方法)
[显示进度并处理错误：flags2 系列](#显示进度并处理错误：flags2 系列)
- 错误处理策略
- [使用 futures.as_completed 函数](#使用 futures.as_completed 函数)
总结

章节概述

本章核心是 concurrent.futures 模块，它对"派生一堆独立线程、通过队列收集结果"的模式进行了高层封装，使用极其简洁，既适用于 I/O 密集型任务（线程） ，也适用于 CPU 密集型任务（进程）。

本章还引入了 future 这一重要概念------表示异步执行操作的对象，类似 JavaScript 中的 Promise。它是 concurrent.futures 和 asyncio 两个并发框架的底层基础。

核心类与工具：

工具	说明
`ThreadPoolExecutor`	线程池，适合 I/O 密集型任务
`ProcessPoolExecutor`	进程池，适合 CPU 密集型任务，绕开 GIL
`Executor.map()`	类似内置 `map`，并发调用同一函数
`Executor.submit()`	提交单个任务，返回 `Future` 对象
`futures.as_completed()`	迭代已完成的 future 对象

并发网络下载：三种方案对比

书中通过下载 20 个国家国旗图像来对比三种方案的性能差异：

复制代码

flags.py            → 依序下载    → 平均 7.18 秒
flags_threadpool.py → 线程池并发  → 平均 1.40 秒  (快 5 倍)
flags_asyncio.py    → 异步协程   → 平均 1.35 秒  (快 5 倍)

重点： 对于 I/O 密集型操作（如网络请求），并发方案能带来 5 倍以上的性能提升。线程与协程之间的差异对于 HTTP 客户端来说并不明显，真正的差距在服务端的伸缩性上（协程内存占用更少，无上下文切换开销）。

依序下载的脚本

flags.py 是基准脚本，核心结构如下：

python 复制代码

import time
from pathlib import Path
import httpx

POP20_CC = ('CH IN US ID BR PK NG BD RU JP '
            'MX PH VN ET EG DE IR TR CD FR').split()

BASE_URL = 'http://example.com/flags'
DEST_DIR = Path('downloaded')

def save_flag(img: bytes, filename: str) -> None:
    """将图像字节数据保存到本地目录"""
    (DEST_DIR / filename).write_bytes(img)

def get_flag(cc: str) -> bytes:
    """根据国家代码构建 URL，下载并返回图像内容"""
    url = f'{BASE_URL}/{cc}/{cc}.gif'.lower()
    resp = httpx.get(url, timeout=6.1, follow_redirects=True)
    resp.raise_for_status()   # HTTP 状态码非 2XX 时抛出异常
    return resp.content

def download_many(cc_list: list[str]) -> int:
    """依序下载：下载完一个才请求下一个"""
    for cc in sorted(cc_list):
        image = get_flag(cc)
        save_flag(image, f'{cc}.gif')
        print(cc, end=' ', flush=True)  # flush=True 确保立即输出，不等换行符
    return len(cc_list)

def main(downloader):
    DEST_DIR.mkdir(exist_ok=True)
    t0 = time.perf_counter()
    count = downloader(POP20_CC)
    elapsed = time.perf_counter() - t0
    print(f'\n{count} downloads in {elapsed:.2f}s')

if __name__ == '__main__':
    main(download_many)

注意： httpx 是书中推荐使用的 HTTP 库，同时提供同步和异步 API，默认不跟踪重定向（follow_redirects=False），这与 requests 不同。调用 resp.raise_for_status() 是良好实践------让失败的请求快速报错而不是悄无声息地忽略。

使用 concurrent.futures 并发下载

flags_threadpool.py 用不到 10 行代码实现了并发下载：

python 复制代码

from concurrent import futures
from flags import save_flag, get_flag, main  # 复用依序版的函数

def download_one(cc: str):
    """下载单个图像------这是每个工作线程执行的函数"""
    image = get_flag(cc)
    save_flag(image, f'{cc}.gif')
    print(cc, end=' ', flush=True)
    return cc

def download_many(cc_list: list[str]) -> int:
    # 作为上下文管理器使用，退出时自动等待所有线程完成
    with futures.ThreadPoolExecutor() as executor:
        # map 方法并发调用 download_one，参数来自 sorted(cc_list)
        res = executor.map(download_one, sorted(cc_list))
    return len(list(res))  # 迭代结果；若有线程抛出异常，此处会重新抛出

if __name__ == '__main__':
    main(download_many)

重点： concurrent.futures 最大的优势是方便在现有依序代码上添加并发逻辑 。把依序 for 循环中的循环体提取为函数，再用 executor.map 调用，就完成了并发改造。

关于 max_workers 的默认值：

python 复制代码

# Python 3.8+ ThreadPoolExecutor 的默认线程数计算方式
import os
max_workers = min(32, os.cpu_count() + 4)

这个设计确保：

至少保留 5 个线程给 I/O 密集型任务
CPU 密集型任务最多使用 32 个核，避免资源浪费
优先复用空闲线程，不轻易创建新线程

Future 对象详解

Future 是 concurrent.futures 和 asyncio 的核心底层组件。Python 中有两个同名类：

concurrent.futures.Future
asyncio.Future

两者作用相同，都表示可能已完成或尚未完成的延迟计算。

关键原则： Future 对象不应由用户代码手动创建，只能由并发框架实例化。Executor.submit() 是创建 Future 的主要方式。

Future 的常用方法：

python 复制代码

from concurrent import futures
import time

def slow_task(n):
    time.sleep(n)
    return f'完成，耗时 {n} 秒'

executor = futures.ThreadPoolExecutor(max_workers=3)

# submit() 提交任务，立即返回 Future 对象（非阻塞）
future = executor.submit(slow_task, 2)

print(future.done())          # False --- 任务还未完成
print(future)                 # <Future at 0x... state=running>

# result() 会阻塞直到任务完成，或超时抛出 TimeoutError
result = future.result(timeout=5)
print(result)                 # '完成，耗时 2 秒'
print(future.done())          # True

# 添加回调：任务完成后自动调用
def on_done(f):
    print(f'回调被触发，结果: {f.result()}')

future2 = executor.submit(slow_task, 1)
future2.add_done_callback(on_done)

executor.shutdown(wait=True)

executor.map vs executor.submit + as_completed 对比：

使用 as_completed 可以直接操作 Future 对象，了解每个任务的状态：

python 复制代码

from concurrent import futures
import time

def slow_task(n):
    time.sleep(n)
    return f'task_{n}_done'

tasks = [3, 1, 4, 1, 5, 2]

with futures.ThreadPoolExecutor(max_workers=3) as executor:
    to_do = []
    for n in tasks:
        future = executor.submit(slow_task, n)
        to_do.append(future)
        print(f'已排定任务 {n}s: {future}')

    # as_completed 在 future 完成后立即产出，不按提交顺序
    for count, future in enumerate(futures.as_completed(to_do), 1):
        res = future.result()  # 此处不会阻塞，因为 future 已完成
        print(f'第 {count} 个完成，结果: {res!r}，future 状态: {future}')

运行输出示例：

复制代码

已排定任务 3s: <Future at 0x... state=running>
已排定任务 1s: <Future at 0x... state=running>
已排定任务 4s: <Future at 0x... state=running>
已排定任务 1s: <Future at 0x... state=pending>   ← 线程不足，等待
已排定任务 5s: <Future at 0x... state=pending>
已排定任务 2s: <Future at 0x... state=pending>
第 1 个完成，结果: 'task_1_done' ...              ← 1s 任务先完成
第 2 个完成，结果: 'task_1_done' ...
第 3 个完成，结果: 'task_2_done' ...
...

重点区别：

executor.map：结果按提交顺序返回，适合批量处理相同类型任务

executor.submit + as_completed：结果按完成顺序返回，更灵活，支持不同可调用对象和参数混用，也支持来自多个不同执行器的 future

使用 concurrent.futures 启动进程

ProcessPoolExecutor 的使用

ProcessPoolExecutor 使用多个 Python 进程 ，真正实现并行计算，完全绕开 GIL。切换方式极其简单------只需把 ThreadPoolExecutor 换成 ProcessPoolExecutor：

python 复制代码

from concurrent import futures

def download_many(cc_list: list[str]) -> int:
    # 只需改这一行！接口完全一致
    with futures.ProcessPoolExecutor() as executor:
        res = executor.map(download_one, sorted(cc_list))
    return len(list(res))

ProcessPoolExecutor 的 max_workers 默认为 None，此时进程数等于 os.cpu_count()。

ThreadPoolExecutor vs ProcessPoolExecutor

python 复制代码

from concurrent import futures
import math
import time

# CPU 密集型任务示例：判断大数是否为素数
def is_prime(n: int) -> bool:
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3, int(math.sqrt(n)) + 1, 2):
        if n % i == 0:
            return False
    return True

LARGE_NUMBERS = [9_999_999_999_999_999, 9_999_999_999_999_917,
                 7_777_777_777_777_753, 6_666_666_666_666_719]

# 方案一：ThreadPoolExecutor（受 GIL 限制，CPU 密集型效果不佳）
def test_thread_pool():
    t0 = time.perf_counter()
    with futures.ThreadPoolExecutor() as executor:
        results = list(executor.map(is_prime, LARGE_NUMBERS))
    elapsed = time.perf_counter() - t0
    print(f'ThreadPoolExecutor: {elapsed:.2f}s, 结果: {results}')

# 方案二：ProcessPoolExecutor（真正并行，CPU 密集型效果好）
def test_process_pool():
    t0 = time.perf_counter()
    with futures.ProcessPoolExecutor() as executor:
        results = list(executor.map(is_prime, LARGE_NUMBERS))
    elapsed = time.perf_counter() - t0
    print(f'ProcessPoolExecutor: {elapsed:.2f}s, 结果: {results}')

if __name__ == '__main__':
    # ProcessPoolExecutor 在多核机器上明显更快
    test_thread_pool()
    test_process_pool()

选型原则：

I/O 密集型任务 （网络请求、文件读写、数据库查询）→ 用 ThreadPoolExecutor，线程切换开销小

CPU 密集型任务 （数学计算、图像处理、数据分析）→ 用 ProcessPoolExecutor，绕开 GIL 实现真并行

进程比线程消耗更多内存，启动时间也更长，不要滥用

使用 ProcessPoolExecutor 重写素数检测（书中 proc_pool.py 核心结构）：

python 复制代码

import sys
from concurrent import futures
from time import perf_counter
from typing import NamedTuple

def is_prime(n: int) -> bool:
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3, int(n**0.5) + 1, 2):
        if n % i == 0:
            return False
    return True

class PrimeResult(NamedTuple):
    n: int
    flag: bool
    elapsed: float

def check(n: int) -> PrimeResult:
    t0 = perf_counter()
    res = is_prime(n)
    return PrimeResult(n, res, perf_counter() - t0)

NUMBERS = [
    9_999_999_999_999_999,
    9_999_999_999_999_917,
    7_777_777_777_777_777,
    7_777_777_777_777_753,
    # ...
]

def main():
    workers = None  # None = 默认使用 CPU 核数个进程
    # _max_workers 是未公开文档的私有属性，用于显示实际进程数
    executor = futures.ProcessPoolExecutor(workers)
    actual_workers = executor._max_workers  # type: ignore
    print(f'使用 {actual_workers} 个进程检测 {len(NUMBERS)} 个数:')
    t0 = perf_counter()
    numbers = sorted(NUMBERS, reverse=True)  # 倒序，先检测最大的数
    with executor:
        # executor.map 返回结果顺序与输入顺序一致（非完成顺序）
        for n, prime, elapsed in executor.map(check, numbers):
            label = 'P' if prime else ' '
            print(f'{n:16}  {label}  {elapsed:9.6f}s')
    print(f'总用时: {perf_counter() - t0:.2f}s')

if __name__ == '__main__':
    main()

注意 executor.map 的顺序特性： 结果顺序与输入顺序一致，即使某些子任务更早完成。这意味着如果列表中有一个特别耗时的任务排在前面，后续结果会被"堵住"等待它完成后才一起显示。若需要按完成顺序处理，应改用 executor.submit + as_completed。

实验 Executor.map 方法

书中提供了一个直观演示 ThreadPoolExecutor.map 执行机制的示例（demo_executor_map.py）：

python 复制代码

from time import sleep, strftime
from concurrent import futures

def display(*args):
    """打印带时间戳的消息"""
    print(strftime('[%H:%M:%S]'), end=' ')
    print(*args)

def loiter(n):
    """模拟耗时 n 秒的任务，用缩进直观展示并发层次"""
    msg = '{}loiter({}): 开始，将等待 {}s...'
    display(msg.format('\t' * n, n, n))
    sleep(n)
    msg = '{}loiter({}): 完成。'
    display(msg.format('\t' * n, n))
    return n * 10  # 返回值：n 的 10 倍

def main():
    display('脚本开始')
    executor = futures.ThreadPoolExecutor(max_workers=3)  # 只有 3 个工作线程
    results = executor.map(loiter, range(5))              # 提交 5 个任务
    display('results 是:', results)                        # 立即返回，是个生成器！
    display('等待各个结果:')
    for i, result in enumerate(results):                  # 迭代时按顺序等待每个结果
        display(f'result {i}: {result}')

if __name__ == '__main__':
    main()

运行输出（带时间戳）：

复制代码

[15:56:50] 脚本开始
[15:56:50] loiter(0): 开始，将等待 0s...
[15:56:50] loiter(0): 完成。
[15:56:50]     loiter(1): 开始，将等待 1s...
[15:56:50]         loiter(2): 开始，将等待 2s...
[15:56:50] results 是: <generator object result_iterator at 0x...>
[15:56:50]             loiter(3): 开始，将等待 3s...
[15:56:50] 等待各个结果:
[15:56:50] result 0: 0          ← loiter(0) 已完成，无需等待
[15:56:51]     loiter(1): 完成。
[15:56:51]                 loiter(4): 开始，将等待 4s...
[15:56:51] result 1: 10         ← 等了 1 秒
[15:56:52]         loiter(2): 完成。
[15:56:52] result 2: 20
[15:56:53]             loiter(3): 完成。
[15:56:53] result 3: 30
[15:56:55]                 loiter(4): 完成。
[15:56:55] result 4: 40         ← 总共等了 5 秒（loiter(4) 从 15:56:51 开始）

从这个实验可以总结 executor.map 的关键行为：

executor.map(func, iterable) 立即返回一个生成器，不阻塞
工作线程一旦空闲就立即接取新任务（loiter(0) 完成后，线程立刻处理 loiter(3)）
迭代 results 生成器时，按提交顺序等待并返回结果------即使后面的任务已经完成，也要等前面的任务先返回

python 复制代码

# 理解 map 按顺序返回的影响：
# 如果 loiter(0) 耗时很长，即使 loiter(1)、loiter(2) 已完成，
# result 0 也会阻塞住 result 1 和 result 2 的获取

# 若要按完成顺序处理，改用 submit + as_completed：
def main_as_completed():
    with futures.ThreadPoolExecutor(max_workers=3) as executor:
        futures_list = [executor.submit(loiter, n) for n in range(5)]
        for future in futures.as_completed(futures_list):
            result = future.result()
            display(f'完成了: {result}')  # 哪个先完成先打印哪个

选择建议：

同一个函数、相同类型参数、需要保持顺序 → 用 executor.map

不同函数或参数、需要按完成顺序处理、或来自多个执行器 → 用 executor.submit + futures.as_completed

显示进度并处理错误：flags2 系列

错误处理策略

flags2 系列示例在依序版的基础上增加了进度条（tqdm 库）和错误处理，是更贴近生产环境的实现。

下载单个文件的函数（含错误处理）：

python 复制代码

from http import HTTPStatus
from enum import Enum
import httpx

class DownloadStatus(Enum):
    OK = 1
    NOT_FOUND = 2
    ERROR = 3

def get_flag(base_url: str, cc: str) -> bytes:
    url = f'{base_url}/{cc}/{cc}.gif'.lower()
    resp = httpx.get(url, timeout=3.1, follow_redirects=True)
    resp.raise_for_status()
    return resp.content

def download_one(cc: str, base_url: str, verbose: bool = False) -> DownloadStatus:
    try:
        image = get_flag(base_url, cc)
    except httpx.HTTPStatusError as exc:
        res = exc.response
        if res.status_code == HTTPStatus.NOT_FOUND:
            # 404 是业务正常情况（有些国家没有图），记录状态但不上抛
            status = DownloadStatus.NOT_FOUND
            msg = f'未找到: {res.url}'
        else:
            # 其他 HTTP 错误（503、429 等）继续向上抛出
            raise
    else:
        # 下载成功
        status = DownloadStatus.OK
        msg = 'OK'

    if verbose:
        print(cc, msg)
    return status

错误处理设计原则： 可预期的业务性错误（如 404 Not Found）在底层处理并转换为状态码；不可预期或更严重的错误（如连接超时、服务器 500）则向上冒泡，由调用方统一处理。避免把所有异常都用 except Exception 一把捞走。

tqdm 进度条的简单使用示例：

python 复制代码

import time
from tqdm import tqdm

# 最简单的用法：包装任意可迭代对象
for i in tqdm(range(100)):
    time.sleep(0.05)
# 终端显示：
# 100%|██████████| 100/100 [00:05<00:00, 19.93it/s]

# 手动指定总数（用于无 len 的迭代器）
from concurrent.futures import ThreadPoolExecutor, as_completed

def slow_download(url):
    time.sleep(0.1)
    return f'downloaded {url}'

urls = [f'http://example.com/{i}' for i in range(50)]
with ThreadPoolExecutor(max_workers=10) as executor:
    futures_list = [executor.submit(slow_download, url) for url in urls]
    # as_completed 返回的迭代器没有 len，需要手动指定 total
    for future in tqdm(as_completed(futures_list), total=len(futures_list)):
        result = future.result()
        # 每完成一个任务，进度条前进一格

使用 futures.as_completed 函数

flags2_threadpool.py 是结合 tqdm 进度条的完整并发版本：

python 复制代码

from collections import Counter
from concurrent.futures import ThreadPoolExecutor, as_completed
import httpx
import tqdm

DEFAULT_CONCUR_REQ = 30   # 默认并发连接数
MAX_CONCUR_REQ = 1000     # 安全上限，防止创建过多线程

def download_many(cc_list: list[str],
                  base_url: str,
                  verbose: bool,
                  concur_req: int) -> Counter:
    counter = Counter()
    with ThreadPoolExecutor(max_workers=concur_req) as executor:
        # 关键惯用法：用字典把 future 映射到对应的国家代码
        # as_completed 乱序返回时，仍能知道每个 future 对应哪个任务
        to_do_map = {}
        for cc in sorted(cc_list):
            future = executor.submit(download_one, cc, base_url, verbose)
            to_do_map[future] = cc   # future -> 国家代码

        # as_completed 返回迭代器，每个 future 完成后立即产出
        done_iter = as_completed(to_do_map)
        if not verbose:
            # 包装进 tqdm，显示动态进度条
            # done_iter 没有 len，需要手动指定 total
            done_iter = tqdm.tqdm(done_iter, total=len(cc_list))

        for future in done_iter:
            try:
                status = future.result()    # 此时 future 已完成，不会阻塞
            except httpx.HTTPStatusError as exc:
                error_msg = f'HTTP {exc.response.status_code}'
                status = DownloadStatus.ERROR
            except httpx.RequestError as exc:
                error_msg = f'网络错误: {exc}'
                status = DownloadStatus.ERROR
            except KeyboardInterrupt:
                break
            else:
                error_msg = ''

            counter[status] += 1
            if verbose and error_msg:
                cc = to_do_map[future]   # 从字典反查国家代码
                print(f'{cc} 错误: {error_msg}')

    return counter

if __name__ == '__main__':
    from flags2_common import main
    main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)

运行效果示例：

复制代码

$ python3 flags2_threadpool.py -s DELAY a b c
DELAY site: http://localhost:8001/flags
Searching for 78 flags: from AA to CZ
30 concurrent connections will be used.
--------------------
43 flags downloaded.
35 not found.
Elapsed time: 1.72s

重要惯用法： 使用 as_completed 时，维护一个 future -> 元数据 的字典是非常常见的标准写法。因为 as_completed 按完成顺序返回，丢失了原始的顺序信息，需要通过字典反查任务的原始上下文（如是哪个 URL、哪个国家代码等）。

依序版 download_many（含 tqdm 和 Counter 统计）：

python 复制代码

from collections import Counter
import tqdm

def download_many_sequential(cc_list, base_url, verbose, _unused):
    counter = Counter()
    cc_iter = sorted(cc_list)
    if not verbose:
        cc_iter = tqdm.tqdm(cc_iter)  # 依序版直接包装列表即可，有 len，无需 total

    for cc in cc_iter:
        try:
            status = download_one(cc, base_url, verbose)
        except httpx.HTTPStatusError as exc:
            error_msg = f'HTTP {exc.response.status_code}'
            status = DownloadStatus.ERROR
        except httpx.RequestError as exc:
            error_msg = f'{exc}'
            status = DownloadStatus.ERROR
        except KeyboardInterrupt:
            break
        else:
            error_msg = ''

        counter[status] += 1
        if verbose and error_msg:
            print(f'{cc} 错误: {error_msg}')

    return counter

# 输出示例：
# LOCAL site: http://localhost:8000/flags
# Searching for 20 flags: from BD to VN
# 1 concurrent connection will be used.
# 20 flags downloaded.
# Elapsed time: 0.10s

总结

本章核心知识点回顾

concurrent.futures 的核心价值 是用最少的代码改动，将依序执行逻辑升级为并发执行。设计哲学是高度封装，让应用开发者无需理解底层线程/进程管理细节。
ThreadPoolExecutor 与 ProcessPoolExecutor 的选型 取决于任务类型：I/O 密集型用线程，CPU 密集型用进程。两者共享相同的 Executor 接口，切换成本极低------有时只需改一行代码。
Future 对象是异步操作的"凭证" ------提交任务后立即拿到凭证，稍后凭证兑换结果。不要自行创建 Future，只通过 executor.submit() 获取。应用代码不应修改 future 状态，状态变更由框架负责。
executor.map vs submit + as_completed 是两种不同的并发结果消费模式：前者简单但按顺序返回（有"慢任务堵塞"风险），后者灵活且按完成顺序返回（生产场景的标准选择）。
future -> 元数据 字典惯用法 是使用 as_completed 时追踪任务上下文的标准写法，务必掌握：提交 future 时将其与原始参数（如 URL、国家代码等）关联存储，处理结果时反查。

个人分析与见解

concurrent.futures 的设计哲学值得深思。 它把复杂的线程管理（创建、同步、异常传播、结果收集）全部隐藏在一个干净的接口后面，让程序员只需关注"我要做什么"，而非"如何协调多个线程"。这正是好的抽象该有的样子。对比 Java 早期的 Thread + Runnable 写法，concurrent.futures 的 Executor 设计（受 Java java.util.concurrent 启发）已经优雅得多。

GIL 并没有让 Python 的并发一无是处。 对于 I/O 密集型任务（这是大多数 Web 应用和数据采集场景的主体），线程在等待 I/O 时会主动释放 GIL，其他线程得以运行。ThreadPoolExecutor 在这类场景下效果显著，性能提升 5-20 倍并不罕见。只有在纯 CPU 计算场景下，GIL 才真正成为瓶颈，此时应切换到 ProcessPoolExecutor，或者借助 Cython、NumPy 等能释放 GIL 的扩展库。

executor.map 的顺序保证是双刃剑。 顺序保证让代码逻辑更直观（输入顺序即输出顺序），但也带来了"慢任务阻塞快任务结果"的问题。书中 proc_pool.py 的实验清晰演示了这一点：倒序提交后，最大的素数（耗时最长）排在第一位，导致其他所有结果都被"憋住"等它。在生产代码中，如果任务耗时差异较大，应优先考虑 submit + as_completed 的组合。

进度条不只是用户体验的装饰。 书中在 flags2 系列引入 tqdm 的深层原因是：为了让进度条能实时更新，必须使用 as_completed（每完成一个 future，更新一次进度）。而 as_completed 正是生产级并发代码的标准写法。可以说，进度条的引入推动了更好的架构选择，这是一个有趣的正向约束。

与 asyncio 的关系。 concurrent.futures 是基于线程/进程的传统并发方案，而 asyncio 是基于协程的单线程异步方案。两者各有适用场景：concurrent.futures 更易于集成到现有同步代码中，改造成本低；asyncio 在高并发服务端场景下内存效率更高（协程比线程占用内存少得多，且无上下文切换开销）。理解本章的 Future 概念，是理解第 21 章 asyncio 的关键基础------两者共享相似的接口设计理念，asyncio.Future 与 concurrent.futures.Future 在 API 层面高度类似，但底层机制截然不同。

参考文档：

【流畅的Python】第20章：并发执行器 — 学习笔记