Python 后端开发技术博客专栏 | 第 09 篇 GIL 深度解析与并发编程实战 -- 多线程、多进程、协程的选型

难度等级： 高级
适合读者： 有 Python 基础的开发者，准备面试的中高级工程师
前置知识： 第 04 篇《Python 内存管理与垃圾回收》、第 08 篇《上下文管理器与类型系统》

导读

"Python 的多线程是假的多线程吗？" -- 这可能是 Python 面试中被问到频率最高的问题。要准确回答这个问题，你需要深入理解 GIL（Global Interpreter Lock，全局解释器锁）。

GIL 是 CPython 解释器中的一个互斥锁，它确保同一时刻只有一个线程可以执行 Python 字节码。这意味着在一个 Python 进程中，即使你创建了 10 个线程，在任一时刻也只有 1 个线程在执行 Python 代码。这对 CPU 密集型任务的影响是灾难性的 -- 多线程不仅没有加速，反而因为线程切换开销变得更慢。

但 GIL 也不是一无是处。对于 I/O 密集型任务（网络请求、文件读写、数据库查询），线程在等待 I/O 时会释放 GIL，让其他线程得以运行。这就是为什么 requests 库的并发爬虫用多线程是有效的。

本文将从 CPython 源码层面剖析 GIL 的工作原理，系统讲解 threading、multiprocessing、concurrent.futures 三大并发模块的实战用法，并给出一个清晰的并发模型选型决策树。

学习目标

读完本文后，你将能够：

从 CPython 实现层面解释 GIL 的存在原因、获取/释放机制以及对不同类型任务的影响
掌握 threading 模块的线程创建、同步原语（Lock、RLock、Semaphore、Event、Condition）和死锁避免策略
掌握 multiprocessing 模块的进程间通信（Queue、Pipe）和共享内存（Value、Array）
熟练使用 concurrent.futures 的 ThreadPoolExecutor 和 ProcessPoolExecutor
根据任务特征（CPU 密集型 / I/O 密集型 / 混合型）选择合适的并发模型
了解 Python 3.13 Free-threaded CPython（PEP 703）的进展
在面试中准确回答 GIL、多线程、多进程、协程等高频问题

一、GIL 的本质与影响

1.1 GIL 的历史背景与设计权衡

GIL 的存在可以追溯到 Python 诞生的 1990 年代。当时多核 CPU 尚未普及，Python 的设计者 Guido van Rossum 做出了一个务实的权衡：

为什么需要 GIL？

CPython 使用引用计数 作为主要的内存管理机制（参见第 04 篇）。每个 Python 对象都有一个 ob_refcnt 字段。如果没有 GIL，多个线程同时修改同一个对象的引用计数，就会发生竞态条件：

python 复制代码

# 伪代码：没有 GIL 时的竞态条件
# 线程 A 和线程 B 同时执行 x = some_object

# 线程 A: read refcnt (= 1)
# 线程 B: read refcnt (= 1)
# 线程 A: write refcnt (= 1 + 1 = 2)
# 线程 B: write refcnt (= 1 + 1 = 2)  # 期望是 3，实际是 2！
# 结果：引用计数错误 -> 可能导致对象被提前释放 -> 段错误

不用 GIL，就需要为每个对象都加一个细粒度的锁。这会带来：

性能开销：每次引用计数变化都要加锁/解锁，单线程性能下降约 30%
复杂性：几乎所有 C 扩展都需要重写以支持细粒度锁
死锁风险：多个细粒度锁之间容易产生死锁

GIL 是一个"粗暴但有效"的方案：一把大锁保护整个解释器状态。

1.2 GIL 的获取与释放机制

python 复制代码

import sys
import threading
import time


# GIL 的切换间隔（Python 3.2+）
print(f"GIL switch interval: {sys.getswitchinterval()}s")
# 默认 0.005 秒（5 毫秒）

# 可以修改切换间隔
# sys.setswitchinterval(0.001)  # 1 毫秒

GIL 的释放时机：

I/O 操作 ：当线程执行 I/O（socket.recv、file.read、time.sleep 等）时，主动释放 GIL
定时释放：每执行一定时间（默认 5ms）后释放 GIL，让其他线程有机会运行
C 扩展显式释放 ：C/C++ 扩展可以通过 Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS 宏手动释放 GIL

python 复制代码

# 演示 GIL 对 CPU 密集型任务的影响
import time
import threading


def cpu_bound(n: int) -> int:
    """CPU 密集型：纯计算"""
    total = 0
    for i in range(n):
        total += i * i
    return total


def io_bound(seconds: float) -> None:
    """I/O 密集型：等待"""
    time.sleep(seconds)


# ========== CPU 密集型：多线程没有加速 ==========
N = 2_000_000

# 单线程
start = time.perf_counter()
cpu_bound(N)
cpu_bound(N)
single_time = time.perf_counter() - start
print(f"CPU-bound single thread: {single_time:.3f}s")

# 双线程
start = time.perf_counter()
t1 = threading.Thread(target=cpu_bound, args=(N,))
t2 = threading.Thread(target=cpu_bound, args=(N,))
t1.start(); t2.start()
t1.join(); t2.join()
multi_time = time.perf_counter() - start
print(f"CPU-bound two threads:   {multi_time:.3f}s")
print(f"Speedup: {single_time / multi_time:.2f}x")
# 通常 speedup ≈ 0.9x~1.0x，多线程没有加速甚至更慢


# ========== I/O 密集型：多线程有效 ==========
start = time.perf_counter()
io_bound(0.1)
io_bound(0.1)
single_io = time.perf_counter() - start
print(f"\nIO-bound single thread: {single_io:.3f}s")

start = time.perf_counter()
t1 = threading.Thread(target=io_bound, args=(0.1,))
t2 = threading.Thread(target=io_bound, args=(0.1,))
t1.start(); t2.start()
t1.join(); t2.join()
multi_io = time.perf_counter() - start
print(f"IO-bound two threads:   {multi_io:.3f}s")
print(f"Speedup: {single_io / multi_io:.2f}x")
# speedup ≈ 2.0x，I/O 密集型任务多线程有效

1.3 Python 3.13 Free-threaded CPython（PEP 703）

Python 3.13（2024 年 10 月发布）引入了实验性的 Free-threaded 模式 （--disable-gil 编译选项）。这是 Python 历史上首次官方支持无 GIL 运行：

核心变化：

引用计数改为原子操作（biased reference counting）
引入了对象级锁替代全局锁
内存分配器改为线程安全的 mimalloc
字典、列表等内置类型内部加入了细粒度锁

当前状态（截至 Python 3.13）：

标记为实验性 ，需要单独的 Python 构建版本（python3.13t）
单线程性能约下降 5%~10%
许多第三方 C 扩展尚未适配
预计 Python 3.14/3.15 会进一步完善

python 复制代码

# 检查当前 Python 是否支持 free-threaded 模式
import sys
# Python 3.13+ 可以检查
has_gil = getattr(sys, '_is_gil_enabled', lambda: True)()
print(f"GIL enabled: {has_gil}")

二、多线程编程（threading）

2.1 线程创建与管理

python 复制代码

import threading
import time
from typing import List


# ========== 方式 1：传入 target 函数 ==========
def download_file(url: str, delay: float = 0.05) -> str:
    """模拟文件下载"""
    thread_name = threading.current_thread().name
    time.sleep(delay)  # 模拟 I/O
    return f"[{thread_name}] Downloaded: {url}"


threads: List[threading.Thread] = []
urls = [f"https://example.com/file_{i}" for i in range(5)]

start = time.perf_counter()
for url in urls:
    t = threading.Thread(target=download_file, args=(url,), name=f"DL-{url[-1]}")
    threads.append(t)
    t.start()

# 等待所有线程完成
for t in threads:
    t.join()
elapsed = time.perf_counter() - start
print(f"5 downloads with threading: {elapsed:.3f}s (vs ~0.25s sequential)")


# ========== 方式 2：继承 Thread ==========
class WorkerThread(threading.Thread):
    def __init__(self, task_id: int):
        super().__init__(name=f"Worker-{task_id}")
        self.task_id = task_id
        self.result = None

    def run(self):
        time.sleep(0.01)
        self.result = self.task_id * 10


workers = [WorkerThread(i) for i in range(3)]
for w in workers:
    w.start()
for w in workers:
    w.join()
results = [w.result for w in workers]
print(f"Worker results: {results}")  # [0, 10, 20]


# ========== 守护线程 ==========
def background_task():
    while True:
        time.sleep(0.01)

daemon = threading.Thread(target=background_task, daemon=True)
daemon.start()
# 守护线程会在主线程退出时自动终止，不需要 join
print(f"Daemon alive: {daemon.is_alive()}")  # True

2.2 线程同步原语

python 复制代码

import threading
import time
from typing import List


# ========== Lock：互斥锁 ==========
class BankAccount:
    """线程安全的银行账户"""

    def __init__(self, balance: float = 0):
        self._balance = balance
        self._lock = threading.Lock()

    def deposit(self, amount: float) -> None:
        with self._lock:
            current = self._balance
            time.sleep(0.001)  # 模拟延迟
            self._balance = current + amount

    def withdraw(self, amount: float) -> bool:
        with self._lock:
            if self._balance >= amount:
                current = self._balance
                time.sleep(0.001)
                self._balance = current - amount
                return True
            return False

    @property
    def balance(self) -> float:
        return self._balance


account = BankAccount(1000)
threads = []

# 50 个线程各存入 10
for _ in range(50):
    t = threading.Thread(target=account.deposit, args=(10,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Balance after 50 deposits of 10: {account.balance}")  # 1500.0


# ========== RLock：可重入锁 ==========
class SafeCounter:
    """使用 RLock 允许同一线程多次获取锁"""

    def __init__(self):
        self._count = 0
        self._lock = threading.RLock()

    def increment(self):
        with self._lock:
            self._count += 1

    def increment_twice(self):
        with self._lock:  # 第一次获取
            self.increment()  # 第二次获取（同一线程，RLock 允许）
            self._count += 1

    @property
    def count(self):
        return self._count


counter = SafeCounter()
counter.increment_twice()
print(f"RLock counter: {counter.count}")  # 2


# ========== Semaphore：信号量 ==========
class ConnectionPool:
    """用信号量限制同时连接数"""

    def __init__(self, max_connections: int = 3):
        self._semaphore = threading.Semaphore(max_connections)
        self._active = 0
        self._lock = threading.Lock()
        self._max_seen = 0

    def connect(self, task_id: int) -> None:
        with self._semaphore:
            with self._lock:
                self._active += 1
                self._max_seen = max(self._max_seen, self._active)
            time.sleep(0.02)  # 模拟使用连接
            with self._lock:
                self._active -= 1


pool = ConnectionPool(max_connections=3)
threads = [threading.Thread(target=pool.connect, args=(i,)) for i in range(10)]
for t in threads:
    t.start()
for t in threads:
    t.join()
print(f"Max concurrent connections: {pool._max_seen}")  # <= 3


# ========== Event：线程间信号 ==========
data_ready = threading.Event()
shared_data = {"value": None}


def producer():
    time.sleep(0.02)
    shared_data["value"] = 42
    data_ready.set()  # 发信号


def consumer():
    data_ready.wait(timeout=1.0)  # 等待信号
    return shared_data["value"]


t_prod = threading.Thread(target=producer)
t_cons = threading.Thread(target=consumer)
t_prod.start()
t_cons.start()
t_prod.join()
t_cons.join()
print(f"Event shared data: {shared_data['value']}")  # 42


# ========== Condition：条件变量 ==========
class BoundedBuffer:
    """基于 Condition 的有界缓冲区（生产者-消费者模式）"""

    def __init__(self, capacity: int = 5):
        self._buffer: List = []
        self._capacity = capacity
        self._condition = threading.Condition()

    def produce(self, item):
        with self._condition:
            while len(self._buffer) >= self._capacity:
                self._condition.wait()  # 缓冲区满，等待
            self._buffer.append(item)
            self._condition.notify()  # 通知消费者

    def consume(self):
        with self._condition:
            while not self._buffer:
                self._condition.wait()  # 缓冲区空，等待
            item = self._buffer.pop(0)
            self._condition.notify()  # 通知生产者
            return item


buf = BoundedBuffer(capacity=3)
produced = []
consumed = []

def producer_fn():
    for i in range(5):
        buf.produce(i)
        produced.append(i)

def consumer_fn():
    for _ in range(5):
        item = buf.consume()
        consumed.append(item)

tp = threading.Thread(target=producer_fn)
tc = threading.Thread(target=consumer_fn)
tp.start(); tc.start()
tp.join(); tc.join()
print(f"Produced: {produced}")  # [0, 1, 2, 3, 4]
print(f"Consumed: {consumed}")  # [0, 1, 2, 3, 4]

2.3 线程安全的数据结构：queue.Queue

python 复制代码

import queue
import threading
import time


def worker(q: queue.Queue, results: list, lock: threading.Lock):
    while True:
        try:
            item = q.get(timeout=0.1)
        except queue.Empty:
            break
        # 处理任务
        result = item * 2
        with lock:
            results.append(result)
        q.task_done()


# 创建任务队列
task_queue: queue.Queue = queue.Queue()
for i in range(10):
    task_queue.put(i)

results: list = []
lock = threading.Lock()

# 创建工作线程
threads = [
    threading.Thread(target=worker, args=(task_queue, results, lock))
    for _ in range(3)
]
for t in threads:
    t.start()

# 等待所有任务完成
task_queue.join()
for t in threads:
    t.join()

print(f"Queue results: {sorted(results)}")  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

2.4 死锁的产生与避免

python 复制代码

import threading
import time


# ========== 死锁示例 ==========
lock_a = threading.Lock()
lock_b = threading.Lock()
deadlock_occurred = False

def thread_1_deadlock():
    global deadlock_occurred
    lock_a.acquire()
    time.sleep(0.01)
    # 尝试获取 lock_b，但 thread_2 持有 lock_b 且在等待 lock_a
    acquired = lock_b.acquire(timeout=0.1)
    if not acquired:
        deadlock_occurred = True
    lock_a.release()
    if acquired:
        lock_b.release()

def thread_2_deadlock():
    lock_b.acquire()
    time.sleep(0.01)
    # 尝试获取 lock_a，但 thread_1 持有 lock_a 且在等待 lock_b
    acquired = lock_a.acquire(timeout=0.1)
    lock_b.release()
    if acquired:
        lock_a.release()

t1 = threading.Thread(target=thread_1_deadlock)
t2 = threading.Thread(target=thread_2_deadlock)
t1.start(); t2.start()
t1.join(); t2.join()
print(f"Deadlock detected (via timeout): {deadlock_occurred}")


# ========== 避免死锁：锁排序 ==========
def safe_transfer(
    from_account: BankAccount,
    to_account: BankAccount,
    amount: float,
) -> bool:
    """通过固定锁顺序避免死锁"""
    # 始终按 id 排序获取锁，确保所有线程的获取顺序一致
    first, second = sorted(
        [from_account, to_account],
        key=lambda a: id(a)
    )
    with first._lock:
        with second._lock:
            if from_account.balance >= amount:
                from_account._balance -= amount
                to_account._balance += amount
                return True
            return False


a1 = BankAccount(1000)
a2 = BankAccount(500)

# 多个线程互相转账，不会死锁
threads = []
for i in range(10):
    if i % 2 == 0:
        t = threading.Thread(target=safe_transfer, args=(a1, a2, 10))
    else:
        t = threading.Thread(target=safe_transfer, args=(a2, a1, 10))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Total after transfers: {a1.balance + a2.balance}")  # 1500.0（守恒）

死锁的四个必要条件（面试考点）：

互斥：资源不能共享，一次只能被一个线程使用
持有并等待：线程持有一个锁，同时等待获取另一个锁
不可抢占：锁不能被强制释放
循环等待：多个线程形成锁的循环依赖

避免策略 ：破坏任意一个条件即可。最常用的是锁排序 （破坏循环等待）和超时机制 （lock.acquire(timeout=...)）。

三、多进程编程（multiprocessing）

3.1 进程间通信

python 复制代码

import multiprocessing
import time
import os


# ========== Queue：进程间消息队列 ==========
def producer_process(q: multiprocessing.Queue, count: int):
    for i in range(count):
        q.put({"id": i, "pid": os.getpid()})
    q.put(None)  # 哨兵值


def consumer_process(q: multiprocessing.Queue, results: list):
    while True:
        item = q.get()
        if item is None:
            break
        results.append(item["id"])


if __name__ == "__main__":
    q: multiprocessing.Queue = multiprocessing.Queue()

    # 使用 Manager 创建共享列表
    manager = multiprocessing.Manager()
    results = manager.list()

    p1 = multiprocessing.Process(target=producer_process, args=(q, 5))
    p2 = multiprocessing.Process(target=consumer_process, args=(q, results))

    p1.start()
    p2.start()
    p1.join()
    p2.join()

    print(f"Process Queue results: {sorted(results)}")  # [0, 1, 2, 3, 4]

3.2 共享内存

python 复制代码

import multiprocessing
import ctypes


def increment_shared(counter, lock, n):
    for _ in range(n):
        with lock:
            counter.value += 1


if __name__ == "__main__":
    # ========== Value：共享简单值 ==========
    counter = multiprocessing.Value(ctypes.c_int, 0)
    lock = multiprocessing.Lock()

    processes = [
        multiprocessing.Process(
            target=increment_shared,
            args=(counter, lock, 1000)
        )
        for _ in range(4)
    ]

    for p in processes:
        p.start()
    for p in processes:
        p.join()

    print(f"Shared counter: {counter.value}")  # 4000


    # ========== Array：共享数组 ==========
    arr = multiprocessing.Array(ctypes.c_double, [0.0, 0.0, 0.0])

    def fill_array(shared_arr, index, value):
        shared_arr[index] = value

    p1 = multiprocessing.Process(target=fill_array, args=(arr, 0, 1.1))
    p2 = multiprocessing.Process(target=fill_array, args=(arr, 1, 2.2))
    p3 = multiprocessing.Process(target=fill_array, args=(arr, 2, 3.3))

    for p in [p1, p2, p3]:
        p.start()
    for p in [p1, p2, p3]:
        p.join()

    print(f"Shared array: {list(arr)}")  # [1.1, 2.2, 3.3]

3.3 进程池

python 复制代码

import multiprocessing
import math
import time


def is_prime(n: int) -> bool:
    """判断素数（CPU 密集型）"""
    if n < 2:
        return False
    if n < 4:
        return True
    if n % 2 == 0 or n % 3 == 0:
        return False
    i = 5
    while i * i <= n:
        if n % i == 0 or n % (i + 2) == 0:
            return False
        i += 6
    return True


if __name__ == "__main__":
    numbers = list(range(100000, 101000))

    # 单进程
    start = time.perf_counter()
    single_results = [is_prime(n) for n in numbers]
    single_time = time.perf_counter() - start
    print(f"Single process: {single_time:.3f}s, primes={sum(single_results)}")

    # 多进程池
    start = time.perf_counter()
    with multiprocessing.Pool(processes=4) as pool:
        multi_results = pool.map(is_prime, numbers)
    multi_time = time.perf_counter() - start
    print(f"Process pool(4): {multi_time:.3f}s, primes={sum(multi_results)}")
    print(f"Results match: {single_results == multi_results}")

四、concurrent.futures 统一接口

4.1 ThreadPoolExecutor vs ProcessPoolExecutor

concurrent.futures 提供了统一的高层接口，让你在线程池和进程池之间轻松切换：

python 复制代码

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
import time
import math


def cpu_task(n: int) -> int:
    """CPU 密集型任务"""
    return sum(i * i for i in range(n))


def io_task(url: str) -> str:
    """I/O 密集型任务（模拟）"""
    time.sleep(0.05)  # 模拟网络请求
    return f"Response from {url}"


# ========== ThreadPoolExecutor：适合 I/O 密集型 ==========
urls = [f"https://api.example.com/data/{i}" for i in range(10)]

start = time.perf_counter()
with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(io_task, urls))
elapsed = time.perf_counter() - start
print(f"ThreadPool (10 IO tasks): {elapsed:.3f}s, got {len(results)} results")


# ========== submit + as_completed：获取最先完成的结果 ==========
with ThreadPoolExecutor(max_workers=3) as executor:
    futures = {
        executor.submit(io_task, url): url
        for url in urls[:5]
    }
    completed_order = []
    for future in as_completed(futures):
        url = futures[future]
        result = future.result()
        completed_order.append(url)

print(f"Completed {len(completed_order)} tasks")


# ========== Future 对象与回调 ==========
def on_complete(future):
    """回调函数"""
    result = future.result()
    # print(f"  Callback: {result[:30]}...")

with ThreadPoolExecutor(max_workers=3) as executor:
    future = executor.submit(io_task, "https://api.example.com/callback")
    future.add_done_callback(on_complete)
    # 回调在 future 完成时自动触发

4.2 map vs submit + as_completed

python 复制代码

from concurrent.futures import ThreadPoolExecutor, as_completed
import time


def process_item(item: int) -> dict:
    """处理单个项目（模拟不同耗时）"""
    delay = 0.01 * (item % 5 + 1)
    time.sleep(delay)
    return {"item": item, "result": item * 2}


items = list(range(10))


# ========== map：保持输入顺序，阻塞直到全部完成 ==========
with ThreadPoolExecutor(max_workers=4) as executor:
    start = time.perf_counter()
    results_map = list(executor.map(process_item, items))
    elapsed_map = time.perf_counter() - start

# 结果顺序与输入顺序一致
map_items = [r["item"] for r in results_map]
print(f"map: {elapsed_map:.3f}s, order={map_items}")


# ========== as_completed：按完成顺序返回，更适合"第一个结果" ==========
with ThreadPoolExecutor(max_workers=4) as executor:
    start = time.perf_counter()
    futures = {executor.submit(process_item, item): item for item in items}
    results_ac = []
    for future in as_completed(futures):
        results_ac.append(future.result())
    elapsed_ac = time.perf_counter() - start

ac_items = [r["item"] for r in results_ac]
print(f"as_completed: {elapsed_ac:.3f}s, order={ac_items}")
# 注意：as_completed 的顺序是完成顺序，不是输入顺序

map vs as_completed 选型：

特性	`executor.map()`	`submit()` + `as_completed()`
结果顺序	保持输入顺序	按完成顺序
异常处理	迭代时抛出	通过 `future.result()`
适用场景	批量处理同质任务	需要尽早处理结果
回调支持	不支持	`future.add_done_callback()`

4.3 异常处理与超时

python 复制代码

from concurrent.futures import ThreadPoolExecutor, TimeoutError
import time


def risky_task(task_id: int) -> str:
    if task_id == 3:
        raise ValueError(f"Task {task_id} failed!")
    time.sleep(0.02)
    return f"Task {task_id} OK"


with ThreadPoolExecutor(max_workers=3) as executor:
    futures = {executor.submit(risky_task, i): i for i in range(5)}

    for future in futures:
        task_id = futures[future]
        try:
            result = future.result(timeout=1.0)
            print(f"  {result}")
        except ValueError as e:
            print(f"  Task {task_id} error: {e}")
        except TimeoutError:
            print(f"  Task {task_id} timed out")

五、并发模型选型决策树

5.1 选型指南

复制代码

你的任务是什么类型？
│
├── CPU 密集型（计算、数据处理、加密、压缩）
│   ├── 纯 Python 计算 → multiprocessing / ProcessPoolExecutor
│   ├── NumPy/SciPy 计算 → 多线程（这些库内部释放 GIL）
│   └── 需要共享大量数据 → multiprocessing + SharedMemory
│
├── I/O 密集型（网络请求、文件读写、数据库查询）
│   ├── 少量并发（< 100） → ThreadPoolExecutor
│   ├── 大量并发（100+） → asyncio（协程）
│   └── 需要与同步代码混合 → ThreadPoolExecutor
│
└── 混合型（有 CPU 也有 I/O）
    ├── 进程池 + 线程池组合
    └── 进程池 + asyncio 组合

5.2 性能对比表

并发模型	适用任务	并发数上限	内存开销	GIL 影响	数据共享
`threading`	I/O 密集型	~数百	低（共享内存）	受限	简单（共享进程内存）
`multiprocessing`	CPU 密集型	~CPU 核心数	高（独立内存）	不受限	需要 IPC 机制
`asyncio`	I/O 密集型	数千~数万	极低	不涉及	简单（单线程内）
`ThreadPoolExecutor`	I/O 密集型	~数百	低	受限	简单
`ProcessPoolExecutor`	CPU 密集型	~CPU 核心数	高	不受限	需要序列化

5.3 实际场景选型

python 复制代码

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import time


# ========== 场景 1：Web 爬虫（I/O 密集型）→ ThreadPoolExecutor ==========
def fetch_url(url: str) -> str:
    time.sleep(0.02)  # 模拟网络请求
    return f"Content of {url}"

urls = [f"https://example.com/page/{i}" for i in range(20)]

start = time.perf_counter()
with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(fetch_url, urls))
print(f"Web crawler (20 urls, 10 workers): {time.perf_counter()-start:.3f}s")


# ========== 场景 2：图片处理（CPU 密集型）→ ProcessPoolExecutor ==========
def process_image(image_id: int) -> dict:
    """模拟 CPU 密集型图片处理"""
    total = sum(i * i for i in range(50000))
    return {"id": image_id, "checksum": total % 1000}

images = list(range(8))

start = time.perf_counter()
# 注意：ProcessPoolExecutor 在 Windows 上需要在 if __name__ == "__main__" 内使用
# 这里用 ThreadPoolExecutor 做演示
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(process_image, images))
print(f"Image processing (8 images, 4 workers): {time.perf_counter()-start:.3f}s")


# ========== 场景 3：混合任务 ==========
def mixed_task(task_id: int) -> dict:
    """CPU + I/O 混合任务"""
    # CPU 部分
    result = sum(i ** 2 for i in range(10000))
    # I/O 部分
    time.sleep(0.01)
    return {"id": task_id, "result": result}

tasks = list(range(10))
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(mixed_task, tasks))
print(f"Mixed tasks (10 tasks, 5 workers): {time.perf_counter()-start:.3f}s")

六、面试高频题汇总

Q1：GIL 是什么？它如何影响 Python 的多线程？

A：GIL（Global Interpreter Lock）是 CPython 解释器中的一个互斥锁，确保同一时刻只有一个线程执行 Python 字节码。它的存在是因为 CPython 的引用计数内存管理不是线程安全的。

GIL 的影响取决于任务类型：

CPU 密集型 ：多线程无法并行，因为同一时刻只有一个线程持有 GIL 在执行 Python 代码。性能接近甚至低于单线程（因为有线程切换开销）
I/O 密集型 ：多线程有效，因为线程在执行 I/O 操作时会释放 GIL，让其他线程运行

GIL 不是 Python 语言规范的一部分，是 CPython 的实现细节。Jython、IronPython 等实现没有 GIL。Python 3.13 引入了实验性的 Free-threaded 模式（PEP 703）。

Q2：多线程、多进程、协程分别适用于什么场景？

A：

多线程（threading） ：适合 I/O 密集型任务，如网络请求、数据库查询、文件读写。线程共享进程内存，数据共享方便。受 GIL 限制，不适合 CPU 密集型
多进程（multiprocessing） ：适合 CPU 密集型任务，如数据处理、图像处理、科学计算。每个进程有独立的 GIL 和内存空间，可以真正并行利用多核。缺点是进程间通信和数据共享比较复杂
协程（asyncio） ：适合高并发 I/O 密集型任务，如 Web 服务器、API 网关。单线程内通过事件循环调度，没有线程切换开销，可以轻松支持数千并发连接。缺点是整个调用链都需要 async/await

Q3：如何避免 Python 中的死锁？

A：死锁需要四个必要条件同时满足：互斥、持有并等待、不可抢占、循环等待。避免策略：

锁排序：所有线程按固定顺序获取锁，破坏循环等待条件。实践中可以按对象 id 排序
超时机制 ：lock.acquire(timeout=...) 获取锁时设置超时，超时后放弃并重试
避免嵌套锁：尽量不在持有一个锁的情况下去获取另一个锁
使用 with 语句：确保锁一定被释放，避免异常导致锁未释放
使用高层抽象 ：queue.Queue、concurrent.futures 等工具内部已处理好同步

Q4：`ThreadPoolExecutor` 和 `ProcessPoolExecutor` 的区别？

A：两者都实现了 Executor 接口，API 完全相同（submit、map、as_completed），区别在于底层实现：

特性	`ThreadPoolExecutor`	`ProcessPoolExecutor`
底层	线程	进程
GIL 影响	受限	不受限（独立 GIL）
内存	共享	独立（需要序列化）
开销	低	高（进程创建/IPC）
数据传递	直接引用	pickle 序列化
适用场景	I/O 密集型	CPU 密集型

注意：ProcessPoolExecutor 的参数和返回值必须是可 pickle 的（可序列化的）。Lambda 函数、局部函数、闭包都不能传给 ProcessPoolExecutor。

Q5：Python 3.13 的 Free-threaded CPython 会改变什么？

A：PEP 703 引入了实验性的 No-GIL 模式。核心变化包括：将引用计数改为原子操作、引入对象级锁、使用线程安全的内存分配器。这意味着 Python 的多线程可以真正并行执行 CPU 密集型任务。但当前（3.13）仍是实验性功能，单线程性能有 5%~10% 的下降，很多第三方 C 扩展尚未适配。预计在 Python 3.14/3.15 中进一步完善。

本章总结

本文系统性地讲解了 Python 并发编程的核心知识：

GIL 的本质：GIL 是 CPython 为了保护引用计数线程安全而引入的全局互斥锁。它在 I/O 操作和定时间隔时释放，使得多线程对 I/O 密集型任务有效，但对 CPU 密集型任务无效。Python 3.13 的 Free-threaded 模式是打破这一限制的第一步。
多线程（threading） ：适合 I/O 密集型任务。核心同步原语包括 Lock（互斥锁）、RLock（可重入锁）、Semaphore（信号量）、Event（线程信号）、Condition（条件变量）。queue.Queue 是线程安全的数据传递工具。死锁可通过锁排序和超时机制避免。
多进程（multiprocessing） ：适合 CPU 密集型任务，每个进程有独立的 GIL。通过 Queue/Pipe 进行进程间通信，通过 Value/Array/SharedMemory 共享数据。Pool.map 提供了简洁的并行 map 接口。
concurrent.futures ：提供了 ThreadPoolExecutor 和 ProcessPoolExecutor 的统一接口。map 保持顺序，submit + as_completed 按完成顺序处理。Future 对象支持回调、异常处理和超时。
选型决策：I/O 密集型 → 多线程/asyncio；CPU 密集型 → 多进程；高并发 I/O → asyncio；混合型 → 进程池 + 线程池/asyncio 组合。

核心原则：不要盲目使用并发。并发引入了复杂性（竞态条件、死锁、调试困难），只有在确认存在性能瓶颈且瓶颈是可并行化的情况下才应引入并发。永远先用 profiling 确认瓶颈，再选择合适的并发模型。

下一篇预告

第 10 篇：asyncio 协程编程全指南 -- 从事件循环到生产实践

下一篇文章将深入 Python 异步编程的核心。你将了解：

协程的本质 ：async def 和 await 的底层机制，协程 vs 线程 vs 进程的对比
事件循环机制 ：I/O 多路复用（select/epoll/kqueue）、asyncio.run() vs loop.run_until_complete()
并发控制 ：asyncio.gather、asyncio.create_task、asyncio.wait、asyncio.Semaphore
异步生态 ：aiohttp、aiomysql、aioredis 等异步库的使用模式
生产实践：异步上下文管理器、优雅关闭、与同步代码的桥接

协程是现代 Python Web 框架（FastAPI、Starlette）的基础。掌握 asyncio，你就掌握了高性能 Python 服务开发的核心能力。

Python 后端开发技术博客专栏 | 作者：耿雨飞

本文为专栏第 09 篇，共 25 篇。完整目录请参阅《Python技术博客专栏大纲》。

Python 后端开发技术博客专栏 | 第 09 篇 GIL 深度解析与并发编程实战 -- 多线程、多进程、协程的选型

导读

学习目标

一、GIL 的本质与影响

1.1 GIL 的历史背景与设计权衡

1.2 GIL 的获取与释放机制

1.3 Python 3.13 Free-threaded CPython（PEP 703）

二、多线程编程（threading）

2.1 线程创建与管理

2.2 线程同步原语

2.3 线程安全的数据结构：queue.Queue

2.4 死锁的产生与避免

三、多进程编程（multiprocessing）

3.1 进程间通信

3.2 共享内存

3.3 进程池

四、concurrent.futures 统一接口

4.1 ThreadPoolExecutor vs ProcessPoolExecutor

4.2 map vs submit + as_completed

4.3 异常处理与超时

五、并发模型选型决策树

5.1 选型指南

5.2 性能对比表

5.3 实际场景选型

六、面试高频题汇总

Q1：GIL 是什么？它如何影响 Python 的多线程？

Q2：多线程、多进程、协程分别适用于什么场景？

Q3：如何避免 Python 中的死锁？

Q4：ThreadPoolExecutor 和 ProcessPoolExecutor 的区别？

Q5：Python 3.13 的 Free-threaded CPython 会改变什么？

本章总结

下一篇预告

Q4：`ThreadPoolExecutor` 和 `ProcessPoolExecutor` 的区别？