Python 并发编程实战:多线程、协程与多进程全解析

Python 并发编程实战:多线程、协程与多进程全解析

🎯 适合人群 :有 Java / Go 并发经验、切换 Python 后对 GIL 和协程感到困惑的后端工程师

⏱️ 阅读时间 :约 60 分钟

💬 一句话定位:从 GIL 底层机制到三种并发工具的完整用法,同步原语、异步模式、进程通信一网打尽------用数据工程场景贯穿始终


从 Java 或 Go 切换到 Python 做并发编程,几乎每个人都会踩同一个坑:写出来的多线程代码,性能比单线程还差

这不是你的问题,是 Python 的设计决定。

但"Python 多线程没用"这个结论只对了一半------理解清楚 GIL 的边界,选对工具,Python 的并发同样可以写出高性能的系统。


一、Python 并发的"特殊性"------先做认知校正

1.1 GIL:绕不过去的全局解释器锁

GIL(Global Interpreter Lock) 是 CPython 解释器中的一把全局互斥锁。它的规则只有一条:任意时刻,只允许一个线程执行 Python 字节码

为什么要有 GIL?CPython 用引用计数管理内存:每个对象记录有多少引用指向它,归零时立即释放。引用计数不是原子操作,多线程同时修改会导致计数错乱,进而造成内存泄漏或 double-free 崩溃。GIL 是一个简单粗暴但有效的解法:直接不让多个线程同时跑,就不会有竞争条件了。

用图来对比 Java 和 Python 的多线程调度模型:
#mermaid-svg-BlajRKj2giDGZP1g{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-BlajRKj2giDGZP1g .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-BlajRKj2giDGZP1g .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-BlajRKj2giDGZP1g .error-icon{fill:#552222;}#mermaid-svg-BlajRKj2giDGZP1g .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-BlajRKj2giDGZP1g .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-BlajRKj2giDGZP1g .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-BlajRKj2giDGZP1g .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-BlajRKj2giDGZP1g .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-BlajRKj2giDGZP1g .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-BlajRKj2giDGZP1g .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-BlajRKj2giDGZP1g .marker{fill:#333333;stroke:#333333;}#mermaid-svg-BlajRKj2giDGZP1g .marker.cross{stroke:#333333;}#mermaid-svg-BlajRKj2giDGZP1g svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-BlajRKj2giDGZP1g p{margin:0;}#mermaid-svg-BlajRKj2giDGZP1g .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-BlajRKj2giDGZP1g .cluster-label text{fill:#333;}#mermaid-svg-BlajRKj2giDGZP1g .cluster-label span{color:#333;}#mermaid-svg-BlajRKj2giDGZP1g .cluster-label span p{background-color:transparent;}#mermaid-svg-BlajRKj2giDGZP1g .label text,#mermaid-svg-BlajRKj2giDGZP1g span{fill:#333;color:#333;}#mermaid-svg-BlajRKj2giDGZP1g .node rect,#mermaid-svg-BlajRKj2giDGZP1g .node circle,#mermaid-svg-BlajRKj2giDGZP1g .node ellipse,#mermaid-svg-BlajRKj2giDGZP1g .node polygon,#mermaid-svg-BlajRKj2giDGZP1g .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-BlajRKj2giDGZP1g .rough-node .label text,#mermaid-svg-BlajRKj2giDGZP1g .node .label text,#mermaid-svg-BlajRKj2giDGZP1g .image-shape .label,#mermaid-svg-BlajRKj2giDGZP1g .icon-shape .label{text-anchor:middle;}#mermaid-svg-BlajRKj2giDGZP1g .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-BlajRKj2giDGZP1g .rough-node .label,#mermaid-svg-BlajRKj2giDGZP1g .node .label,#mermaid-svg-BlajRKj2giDGZP1g .image-shape .label,#mermaid-svg-BlajRKj2giDGZP1g .icon-shape .label{text-align:center;}#mermaid-svg-BlajRKj2giDGZP1g .node.clickable{cursor:pointer;}#mermaid-svg-BlajRKj2giDGZP1g .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-BlajRKj2giDGZP1g .arrowheadPath{fill:#333333;}#mermaid-svg-BlajRKj2giDGZP1g .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-BlajRKj2giDGZP1g .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-BlajRKj2giDGZP1g .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-BlajRKj2giDGZP1g .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-BlajRKj2giDGZP1g .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-BlajRKj2giDGZP1g .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-BlajRKj2giDGZP1g .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-BlajRKj2giDGZP1g .cluster text{fill:#333;}#mermaid-svg-BlajRKj2giDGZP1g .cluster span{color:#333;}#mermaid-svg-BlajRKj2giDGZP1g div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-BlajRKj2giDGZP1g .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-BlajRKj2giDGZP1g rect.text{fill:none;stroke-width:0;}#mermaid-svg-BlajRKj2giDGZP1g .icon-shape,#mermaid-svg-BlajRKj2giDGZP1g .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-BlajRKj2giDGZP1g .icon-shape p,#mermaid-svg-BlajRKj2giDGZP1g .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-BlajRKj2giDGZP1g .icon-shape .label rect,#mermaid-svg-BlajRKj2giDGZP1g .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-BlajRKj2giDGZP1g .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-BlajRKj2giDGZP1g .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-BlajRKj2giDGZP1g :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Python 多线程(伪并行)
线程 1
GIL

(同一时刻只有一个线程持有)
线程 2
线程 3
CPU 核心(同一时刻只用一个)
Java 多线程(真正并行)
线程 1

执行 Java 代码
CPU 核心 1
线程 2

执行 Java 代码
CPU 核心 2
线程 3

执行 Java 代码
CPU 核心 3

🤔 我的理解:Java 的多线程是真正的并行------多个线程可以同时在不同 CPU 核心上跑。Python 的多线程更像是"交替跑"------线程们排队争抢一把锁,同一时刻只有一个线程在真正执行 Python 代码。

1.2 GIL 的释放机制:check interval

GIL 不是永远抱着不放。CPython 有一套"强制释放"机制------check interval(检查间隔)

  • Python 3.2 以前:每执行 100 条字节码指令,强制触发一次 GIL 切换检查
  • Python 3.2 以后 :改为基于时间,默认每 5ms 强制检查一次(sys.getswitchinterval() 查看,sys.setswitchinterval() 修改)
python 复制代码
import sys

sys.getswitchinterval()  # 0.005(5ms)

# 测试:降低间隔,让线程切换更频繁(调试用,生产别这么做)
sys.setswitchinterval(0.001)  # 1ms

除了 check interval,以下两种情况 GIL 也会主动释放

  1. I/O 操作期间:线程发起网络请求、读写文件时,会主动释放 GIL,让其他线程趁机运行
  2. 调用释放 GIL 的 C 扩展 :NumPy 的矩阵运算、time.sleep()hashlib 等底层 C 代码执行时,GIL 暂时释放

⚠️ 一个容易误解的点 :NumPy 的 C 层计算会释放 GIL,所以 NumPy 密集运算 + 多线程是有效的。但纯 Python 的 CPU 密集循环不会释放 GIL,多线程没用。

python 复制代码
import numpy as np
import threading
import time

# NumPy 矩阵乘法:C 层执行,会释放 GIL → 多线程有效
def numpy_work():
    a = np.random.rand(2000, 2000)
    np.dot(a, a)

# 纯 Python 循环:不释放 GIL → 多线程无效
def pure_python_work():
    result = 0
    for i in range(10_000_000):
        result += i

# 可以自己测一下,numpy_work 的多线程版本会比串行快,pure_python_work 不会

1.3 三种并发工具一览

工具 并发模型 适用场景 绕过 GIL?
threading 操作系统线程,GIL 交替执行 I/O 密集,有遗留同步代码 ❌(I/O 等待时释放)
asyncio 单线程事件循环,协程协作式调度 I/O 密集,高并发 ❌(单线程,不需要)
multiprocessing 多进程,各自独立的 GIL CPU 密集,数据并行 ✅(独立进程)

📝 本文涉及的示例代码均基于 Python 3.11+。部分新特性(TaskGroup、ExceptionGroup)需要 Python 3.11 及以上。


二、threading:多线程详解

数据工程中最常见的 I/O 密集任务:批量调用外部 API 拉数据、并发读写文件、异步查询数据库。这类任务的特点是大量时间在等待,而不是在计算。

2.1 Thread 的生命周期

#mermaid-svg-lttDwa8YUbOng71k{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-lttDwa8YUbOng71k .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-lttDwa8YUbOng71k .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-lttDwa8YUbOng71k .error-icon{fill:#552222;}#mermaid-svg-lttDwa8YUbOng71k .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-lttDwa8YUbOng71k .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-lttDwa8YUbOng71k .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-lttDwa8YUbOng71k .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-lttDwa8YUbOng71k .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-lttDwa8YUbOng71k .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-lttDwa8YUbOng71k .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-lttDwa8YUbOng71k .marker{fill:#333333;stroke:#333333;}#mermaid-svg-lttDwa8YUbOng71k .marker.cross{stroke:#333333;}#mermaid-svg-lttDwa8YUbOng71k svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-lttDwa8YUbOng71k p{margin:0;}#mermaid-svg-lttDwa8YUbOng71k defs #statediagram-barbEnd{fill:#333333;stroke:#333333;}#mermaid-svg-lttDwa8YUbOng71k g.stateGroup text{fill:#9370DB;stroke:none;font-size:10px;}#mermaid-svg-lttDwa8YUbOng71k g.stateGroup text{fill:#333;stroke:none;font-size:10px;}#mermaid-svg-lttDwa8YUbOng71k g.stateGroup .state-title{font-weight:bolder;fill:#131300;}#mermaid-svg-lttDwa8YUbOng71k g.stateGroup rect{fill:#ECECFF;stroke:#9370DB;}#mermaid-svg-lttDwa8YUbOng71k g.stateGroup line{stroke:#333333;stroke-width:1;}#mermaid-svg-lttDwa8YUbOng71k .transition{stroke:#333333;stroke-width:1;fill:none;}#mermaid-svg-lttDwa8YUbOng71k .stateGroup .composit{fill:white;border-bottom:1px;}#mermaid-svg-lttDwa8YUbOng71k .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px;}#mermaid-svg-lttDwa8YUbOng71k .state-note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-lttDwa8YUbOng71k .state-note text{fill:black;stroke:none;font-size:10px;}#mermaid-svg-lttDwa8YUbOng71k .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5;}#mermaid-svg-lttDwa8YUbOng71k .edgeLabel .label rect{fill:#ECECFF;opacity:0.5;}#mermaid-svg-lttDwa8YUbOng71k .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-lttDwa8YUbOng71k .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-lttDwa8YUbOng71k .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-lttDwa8YUbOng71k .edgeLabel .label text{fill:#333;}#mermaid-svg-lttDwa8YUbOng71k .label div .edgeLabel{color:#333;}#mermaid-svg-lttDwa8YUbOng71k .stateLabel text{fill:#131300;font-size:10px;font-weight:bold;}#mermaid-svg-lttDwa8YUbOng71k .node circle.state-start{fill:#333333;stroke:#333333;}#mermaid-svg-lttDwa8YUbOng71k .node .fork-join{fill:#333333;stroke:#333333;}#mermaid-svg-lttDwa8YUbOng71k .node circle.state-end{fill:#9370DB;stroke:white;stroke-width:1.5;}#mermaid-svg-lttDwa8YUbOng71k .end-state-inner{fill:white;stroke-width:1.5;}#mermaid-svg-lttDwa8YUbOng71k .node rect{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-lttDwa8YUbOng71k .node polygon{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-lttDwa8YUbOng71k #statediagram-barbEnd{fill:#333333;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-cluster rect{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-lttDwa8YUbOng71k .cluster-label,#mermaid-svg-lttDwa8YUbOng71k .nodeLabel{color:#131300;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-cluster rect.outer{rx:5px;ry:5px;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-state .divider{stroke:#9370DB;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-state .title-state{rx:5px;ry:5px;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-cluster.statediagram-cluster .inner{fill:white;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-cluster.statediagram-cluster-alt .inner{fill:#f0f0f0;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-cluster .inner{rx:0;ry:0;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-state rect.basic{rx:5px;ry:5px;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#f0f0f0;}#mermaid-svg-lttDwa8YUbOng71k .note-edge{stroke-dasharray:5;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-note rect{fill:#fff5ad;stroke:#aaaa33;stroke-width:1px;rx:0;ry:0;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-note rect{fill:#fff5ad;stroke:#aaaa33;stroke-width:1px;rx:0;ry:0;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-note text{fill:black;}#mermaid-svg-lttDwa8YUbOng71k .statediagram-note .nodeLabel{color:black;}#mermaid-svg-lttDwa8YUbOng71k .statediagram .edgeLabel{color:red;}#mermaid-svg-lttDwa8YUbOng71k #dependencyStart,#mermaid-svg-lttDwa8YUbOng71k #dependencyEnd{fill:#333333;stroke:#333333;stroke-width:1;}#mermaid-svg-lttDwa8YUbOng71k .statediagramTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-lttDwa8YUbOng71k :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Thread()
start()
获得 GIL
GIL 被抢走 / check interval
I/O 等待 / sleep / Lock
I/O 完成 / 被唤醒 / 拿到锁
run() 执行完毕
新建
就绪
运行
阻塞
结束

python 复制代码
import threading
import time

def worker(name: str, duration: float):
    print(f"[{name}] 开始,线程 ID: {threading.current_thread().ident}")
    time.sleep(duration)
    print(f"[{name}] 结束")

# 基本用法
t = threading.Thread(target=worker, args=("任务A", 1.0), daemon=True)
t.start()
t.join(timeout=5)  # 最多等 5 秒
print(f"线程是否还活着: {t.is_alive()}")

# 继承 Thread(适合需要携带更多状态的场景)
class DataFetchThread(threading.Thread):
    def __init__(self, source: str):
        super().__init__(daemon=True)
        self.source = source
        self.result = None
        self.error = None

    def run(self):
        try:
            # 模拟数据拉取
            time.sleep(0.5)
            self.result = {"source": self.source, "data": [1, 2, 3]}
        except Exception as e:
            self.error = e

threads = [DataFetchThread(f"source_{i}") for i in range(3)]
for t in threads: t.start()
for t in threads: t.join()

for t in threads:
    if t.error:
        print(f"❌ {t.source}: {t.error}")
    else:
        print(f"✅ {t.source}: {t.result}")

💡 daemon 线程 :设置 daemon=True 后,主程序退出时这些线程会被强制终止,不需要手动 join。适合后台任务(心跳、日志刷新等),但要注意它们没有机会做清理工作。

2.2 ThreadPoolExecutor:推荐的线程池用法

直接用 Thread 管理线程很繁琐,concurrent.futures.ThreadPoolExecutor 是更高层、更推荐的方式:

python 复制代码
import time
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed, Future

DATA_SOURCES = [
    {"name": "淘宝", "url": "https://httpbin.org/delay/1"},
    {"name": "京东", "url": "https://httpbin.org/delay/1"},
    {"name": "拼多多", "url": "https://httpbin.org/delay/1"},
    {"name": "抖音", "url": "https://httpbin.org/delay/1"},
    {"name": "快手", "url": "https://httpbin.org/delay/1"},
]

def fetch_sales_data(source: dict) -> dict:
    """同步拉取单个数据源的销售数据"""
    start = time.time()
    response = requests.get(source["url"], timeout=10)
    elapsed = time.time() - start
    return {
        "source": source["name"],
        "status": response.status_code,
        "elapsed": round(elapsed, 2),
    }

def fetch_all_threading(sources: list, max_workers: int = 5) -> list:
    """用线程池并发拉取所有数据源"""
    results = []
    start = time.time()

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        # submit 提交任务,返回 Future 对象
        future_to_source = {
            executor.submit(fetch_sales_data, source): source
            for source in sources
        }

        # as_completed 按完成顺序迭代(不是提交顺序)
        for future in as_completed(future_to_source):
            source = future_to_source[future]
            try:
                result = future.result()
                results.append(result)
                print(f"✅ {source['name']} 完成,耗时 {result['elapsed']}s")
            except Exception as e:
                print(f"❌ {source['name']} 失败:{e}")
                results.append({"source": source["name"], "error": str(e)})

    total = round(time.time() - start, 2)
    print(f"\n总耗时:{total}s(串行约需 5s)")
    return results
Future 对象详解

executor.submit() 返回的 Future 是任务的"句柄",比结果本身更灵活:

python 复制代码
from concurrent.futures import ThreadPoolExecutor, Future
import time

def slow_task(x: int) -> int:
    time.sleep(x)
    if x == 3:
        raise ValueError("x 不能是 3")
    return x * 10

with ThreadPoolExecutor(max_workers=4) as executor:
    futures: list[Future] = [executor.submit(slow_task, i) for i in range(5)]

    # 立即检查状态(非阻塞)
    for f in futures:
        print(f.running(), f.done())   # 正在执行?已完成?

    # 添加回调:任务完成后自动调用(在完成线程中执行,注意线程安全)
    def on_done(future: Future):
        if future.exception():
            print(f"回调:任务失败 → {future.exception()}")
        else:
            print(f"回调:任务成功 → {future.result()}")

    for f in futures:
        f.add_done_callback(on_done)

    # result() 阻塞等待,可以设置超时
    try:
        result = futures[0].result(timeout=5)
    except TimeoutError:
        print("超时了")
    except Exception as e:
        print(f"任务异常:{e}")

💡 as_completedexecutor.map 更灵活:它按任务完成顺序返回,可以"谁先完成谁先处理",不等最慢的那个。executor.map 按提交顺序返回,如果第一个最慢,后面全被阻塞。

2.3 同步原语:锁、信号量、事件

多线程共享状态时,需要同步原语来避免竞争条件。

Lock 和 RLock
python 复制代码
import threading

# ---- Lock:互斥锁 ----
lock = threading.Lock()
results = []

def good_worker(data):
    processed = do_something(data)
    with lock:             # 进入临界区
        results.append(processed)
    # 离开临界区,锁自动释放

# ❌ 糟糕:RLock 和 Lock 的误用
def bad_nested(lock):
    with lock:
        with lock:         # Lock 不可重入,这里会死锁!
            pass

# ✅ 正确:需要重入时用 RLock(Reentrant Lock)
rlock = threading.RLock()
def safe_nested():
    with rlock:
        with rlock:        # RLock 同一线程可多次获取,不会死锁
            pass
Semaphore:控制并发数量

Semaphore 是比 Lock 更通用的同步原语,允许同时有 N 个线程进入临界区。常用于限流

python 复制代码
import threading
import time

# 限制同时最多 3 个线程访问外部 API(避免被限速)
api_semaphore = threading.Semaphore(3)

def call_external_api(task_id: int):
    with api_semaphore:
        print(f"任务 {task_id} 开始调用 API")
        time.sleep(0.5)    # 模拟 API 耗时
        print(f"任务 {task_id} 完成")

threads = [threading.Thread(target=call_external_api, args=(i,)) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()
# 观察输出:始终最多 3 个"开始调用 API" 同时出现
Event:线程间的信号通知
python 复制代码
import threading
import time

# 场景:数据加载完成后,通知所有等待的处理线程开始工作
data_ready = threading.Event()
shared_data = None

def data_loader():
    global shared_data
    print("开始加载数据...")
    time.sleep(2)
    shared_data = [1, 2, 3, 4, 5]
    data_ready.set()       # 发出信号:数据已就绪
    print("数据加载完成,信号已发出")

def data_processor(name: str):
    print(f"[{name}] 等待数据...")
    data_ready.wait()      # 阻塞,直到收到信号
    print(f"[{name}] 收到信号,开始处理:{shared_data}")

loader = threading.Thread(target=data_loader)
processors = [threading.Thread(target=data_processor, args=(f"处理器{i}",)) for i in range(3)]

for p in processors: p.start()
loader.start()

loader.join()
for p in processors: p.join()
Condition:精细的条件等待

Condition 是更强大的同步原语,允许线程等待某个条件成立,适合实现生产者-消费者模式:

python 复制代码
import threading
from collections import deque

class BoundedQueue:
    """有界队列:生产者-消费者模型"""

    def __init__(self, maxsize: int):
        self.maxsize = maxsize
        self._queue = deque()
        self._cond = threading.Condition()

    def put(self, item):
        with self._cond:
            while len(self._queue) >= self.maxsize:
                print("队列满,生产者等待...")
                self._cond.wait()           # 释放锁并等待
            self._queue.append(item)
            self._cond.notify_all()         # 通知消费者有新数据

    def get(self):
        with self._cond:
            while len(self._queue) == 0:
                print("队列空,消费者等待...")
                self._cond.wait()
            item = self._queue.popleft()
            self._cond.notify_all()         # 通知生产者有空位
            return item

# 使用
import time
q = BoundedQueue(maxsize=3)

def producer():
    for i in range(6):
        q.put(i)
        print(f"生产: {i}")
        time.sleep(0.1)

def consumer(name: str):
    for _ in range(3):
        item = q.get()
        print(f"[{name}] 消费: {item}")
        time.sleep(0.3)

t_prod = threading.Thread(target=producer)
t_cons1 = threading.Thread(target=consumer, args=("消费者A",))
t_cons2 = threading.Thread(target=consumer, args=("消费者B",))

t_prod.start(); t_cons1.start(); t_cons2.start()
t_prod.join(); t_cons1.join(); t_cons2.join()

2.4 线程本地存储:threading.local

有时候需要每个线程有自己独立的变量副本(比如数据库连接),这就是 threading.local 的用武之地:

python 复制代码
import threading
import time

# 场景:每个线程维护自己的数据库连接,避免连接被多线程共享导致的问题
thread_local = threading.local()

def get_db_connection():
    """获取当前线程的数据库连接,不存在则创建"""
    if not hasattr(thread_local, 'db_conn'):
        # 每个线程第一次调用时创建各自的连接
        thread_local.db_conn = f"Connection-{threading.current_thread().name}"
        print(f"[{threading.current_thread().name}] 创建新连接: {thread_local.db_conn}")
    return thread_local.db_conn

def worker_task(task_id: int):
    conn = get_db_connection()   # 拿到本线程的连接
    time.sleep(0.1)
    conn2 = get_db_connection()  # 同一线程,复用已有连接
    assert conn is conn2         # 同一个线程内,始终是同一个连接对象
    print(f"任务 {task_id} 使用连接: {conn}")

threads = [threading.Thread(target=worker_task, args=(i,), name=f"Thread-{i}") for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()
# 输出:每个线程只创建一次连接,互不干扰

2.5 线程安全的队列

queue.Queue 是线程安全的队列,内部自带锁,是多线程任务分发的首选:

python 复制代码
import threading
import queue
import time

def producer(q: queue.Queue, items: list):
    for item in items:
        q.put(item)
        print(f"生产: {item}")
        time.sleep(0.05)
    q.put(None)  # 哨兵值,通知消费者结束

def consumer(q: queue.Queue, name: str):
    while True:
        try:
            item = q.get(timeout=3)  # 最多等 3 秒
        except queue.Empty:
            print(f"[{name}] 超时,退出")
            break

        if item is None:
            q.put(None)   # 把哨兵传给下一个消费者
            break

        # 处理任务
        time.sleep(0.1)
        print(f"[{name}] 处理: {item}")
        q.task_done()    # 标记任务完成

task_queue = queue.Queue(maxsize=10)
tasks = list(range(20))

prod = threading.Thread(target=producer, args=(task_queue, tasks))
cons_list = [threading.Thread(target=consumer, args=(task_queue, f"消费者{i}")) for i in range(3)]

prod.start()
for c in cons_list: c.start()

prod.join()
for c in cons_list: c.join()

task_queue.join()  # 等待队列中所有 task_done 被调用
print("所有任务处理完毕")

三、asyncio:异步编程详解

3.1 Event Loop 调度原理

asyncio 采用完全不同的并发模型:单线程 + 协作式调度

核心角色:

  • Event Loop(事件循环):调度中心,不断检查哪些协程可以继续运行
  • Coroutine(协程) :用 async def 定义的函数,可以在 await 处主动让出控制权
  • Task(任务) :被提交给 Event Loop 的协程,create_task() 后立即被调度
  • Future:代表一个异步操作的最终结果,Task 是 Future 的子类

网络 I/O 协程:拉拼多多数据 协程:拉京东数据 协程:拉淘宝数据 Event Loop(单线程) 网络 I/O 协程:拉拼多多数据 协程:拉京东数据 协程:拉淘宝数据 Event Loop(单线程) #mermaid-svg-tILP7F8u9BtalgEN{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-tILP7F8u9BtalgEN .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-tILP7F8u9BtalgEN .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-tILP7F8u9BtalgEN .error-icon{fill:#552222;}#mermaid-svg-tILP7F8u9BtalgEN .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-tILP7F8u9BtalgEN .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-tILP7F8u9BtalgEN .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-tILP7F8u9BtalgEN .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-tILP7F8u9BtalgEN .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-tILP7F8u9BtalgEN .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-tILP7F8u9BtalgEN .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-tILP7F8u9BtalgEN .marker{fill:#333333;stroke:#333333;}#mermaid-svg-tILP7F8u9BtalgEN .marker.cross{stroke:#333333;}#mermaid-svg-tILP7F8u9BtalgEN svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-tILP7F8u9BtalgEN p{margin:0;}#mermaid-svg-tILP7F8u9BtalgEN .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-tILP7F8u9BtalgEN text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-tILP7F8u9BtalgEN .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-tILP7F8u9BtalgEN .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-tILP7F8u9BtalgEN .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-tILP7F8u9BtalgEN .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-tILP7F8u9BtalgEN #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-tILP7F8u9BtalgEN .sequenceNumber{fill:white;}#mermaid-svg-tILP7F8u9BtalgEN #sequencenumber{fill:#333;}#mermaid-svg-tILP7F8u9BtalgEN #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-tILP7F8u9BtalgEN .messageText{fill:#333;stroke:none;}#mermaid-svg-tILP7F8u9BtalgEN .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-tILP7F8u9BtalgEN .labelText,#mermaid-svg-tILP7F8u9BtalgEN .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-tILP7F8u9BtalgEN .loopText,#mermaid-svg-tILP7F8u9BtalgEN .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-tILP7F8u9BtalgEN .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-tILP7F8u9BtalgEN .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-tILP7F8u9BtalgEN .noteText,#mermaid-svg-tILP7F8u9BtalgEN .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-tILP7F8u9BtalgEN .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-tILP7F8u9BtalgEN .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-tILP7F8u9BtalgEN .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-tILP7F8u9BtalgEN .actorPopupMenu{position:absolute;}#mermaid-svg-tILP7F8u9BtalgEN .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-tILP7F8u9BtalgEN .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-tILP7F8u9BtalgEN .actor-man circle,#mermaid-svg-tILP7F8u9BtalgEN line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-tILP7F8u9BtalgEN :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} t=0ms,同时提交 5 个协程 t≈1000ms,各接口陆续返回 总耗时 ≈ 最慢那个接口的时间 运行 发起 HTTP 请求(非阻塞) await,让出控制权 运行 发起 HTTP 请求(非阻塞) await,让出控制权 运行 发起 HTTP 请求(非阻塞) await,让出控制权 C2 的响应到了 恢复运行,处理响应 C1 的响应到了 恢复运行,处理响应

关键理解:协程在 await主动让出控制权,而不是被操作系统强制切换。这就是"协作式调度"的含义------比多线程更轻量,因为没有 OS 调度开销和上下文切换成本。

3.2 async/await 基础

python 复制代码
import asyncio

# async def 定义协程函数,调用它返回协程对象(不会立即执行)
async def say_hello(name: str, delay: float):
    await asyncio.sleep(delay)   # 让出控制权,Event Loop 去跑其他协程
    print(f"Hello, {name}!")
    return f"done: {name}"

# 运行协程的三种方式:
# 1. asyncio.run():程序入口,创建并运行 Event Loop
async def main():
    # 2. await:直接等待一个协程(串行)
    result = await say_hello("Alice", 1.0)

    # 3. create_task():并发运行多个协程(并行提交,不等待)
    task1 = asyncio.create_task(say_hello("Bob", 0.5))
    task2 = asyncio.create_task(say_hello("Charlie", 0.3))

    # 等待所有任务完成
    results = await asyncio.gather(task1, task2)
    print(results)  # ['done: Bob', 'done: Charlie']

asyncio.run(main())

3.3 gather、wait 和 TaskGroup

asyncio 提供了多种方式来并发运行多个协程,各有不同的错误处理语义:

python 复制代码
import asyncio

async def risky_task(name: str, should_fail: bool):
    await asyncio.sleep(0.5)
    if should_fail:
        raise ValueError(f"{name} 失败了")
    return f"{name} 成功"

async def demo_gather():
    # gather:等待全部完成,return_exceptions=True 时异常不会打断其他任务
    results = await asyncio.gather(
        risky_task("A", False),
        risky_task("B", True),   # 这个会失败
        risky_task("C", False),
        return_exceptions=True   # 异常作为普通返回值,不抛出
    )
    for r in results:
        if isinstance(r, Exception):
            print(f"❌ 失败: {r}")
        else:
            print(f"✅ {r}")

async def demo_wait():
    # wait:更精细的控制,可以按"第一个完成"或"全部完成"等方式等待
    tasks = [asyncio.create_task(risky_task(f"任务{i}", i == 2)) for i in range(5)]

    done, pending = await asyncio.wait(
        tasks,
        return_when=asyncio.FIRST_EXCEPTION   # 一旦有异常就返回
    )

    print(f"已完成: {len(done)},待完成: {len(pending)}")

    # 取消未完成的任务
    for t in pending:
        t.cancel()

async def demo_taskgroup():
    # TaskGroup(Python 3.11+):推荐的新方式,任一任务失败则取消其他任务
    try:
        async with asyncio.TaskGroup() as tg:
            t1 = tg.create_task(risky_task("X", False))
            t2 = tg.create_task(risky_task("Y", True))   # 失败
            t3 = tg.create_task(risky_task("Z", False))
        # 所有任务都成功才走到这里
    except* ValueError as eg:
        # except* 捕获 ExceptionGroup(Python 3.11+ 语法)
        print(f"有 {len(eg.exceptions)} 个任务失败:")
        for e in eg.exceptions:
            print(f"  - {e}")

asyncio.run(demo_gather())

🤔 用哪个? Python 3.11+ 推荐用 TaskGroup------语义更清晰,任一子任务失败会自动取消其他任务,不会有"其他任务还在跑但你不知道"的问题。老版本用 gather(return_exceptions=True)

3.4 超时控制和任务取消

python 复制代码
import asyncio

async def slow_api_call(name: str) -> str:
    await asyncio.sleep(5)
    return f"{name} 的数据"

async def fetch_with_timeout():
    # 方式一:asyncio.wait_for 设置超时
    try:
        result = await asyncio.wait_for(
            slow_api_call("淘宝"),
            timeout=2.0
        )
    except asyncio.TimeoutError:
        print("超时了,使用默认值")
        result = {}

    # 方式二:asyncio.timeout(Python 3.11+)------ 更 Pythonic
    async with asyncio.timeout(2.0):
        try:
            result = await slow_api_call("京东")
        except asyncio.TimeoutError:
            print("京东超时")

async def demo_cancel():
    """主动取消任务"""
    task = asyncio.create_task(slow_api_call("拼多多"))

    await asyncio.sleep(1)  # 等 1 秒
    task.cancel()           # 发送取消信号

    try:
        await task          # 等待取消完成
    except asyncio.CancelledError:
        print("任务已被取消")
        # 这里可以做清理工作,然后决定是否重新 raise

asyncio.run(fetch_with_timeout())
asyncio.run(demo_cancel())

3.5 asyncio 的同步原语

asyncio 也有自己的一套同步原语,用法和 threading 类似,但它们是协程安全的(不是线程安全):

python 复制代码
import asyncio

# ---- asyncio.Lock:互斥锁 ----
lock = asyncio.Lock()
counter = 0

async def increment():
    global counter
    async with lock:
        current = counter
        await asyncio.sleep(0)   # 让出控制权,模拟竞争
        counter = current + 1

# ---- asyncio.Semaphore:限流 ----
# 场景:批量调用 API,每秒最多 10 个并发请求
rate_limiter = asyncio.Semaphore(10)

async def controlled_request(session, url: str):
    async with rate_limiter:       # 超过 10 个并发时,后续协程在这里等待
        async with session.get(url) as resp:
            return await resp.json()

# ---- asyncio.Queue:协程间数据传递 ----
async def producer(q: asyncio.Queue):
    for i in range(10):
        await q.put(i)
        print(f"生产: {i}")
        await asyncio.sleep(0.1)
    await q.put(None)   # 哨兵

async def consumer(q: asyncio.Queue, name: str):
    while True:
        item = await q.get()
        if item is None:
            await q.put(None)  # 传给其他消费者
            break
        print(f"[{name}] 消费: {item}")
        await asyncio.sleep(0.2)
        q.task_done()

async def pipeline():
    q = asyncio.Queue(maxsize=5)   # 有界队列,生产者过快时会等待
    await asyncio.gather(
        producer(q),
        consumer(q, "消费者A"),
        consumer(q, "消费者B"),
    )

asyncio.run(pipeline())

3.6 async for 和 async with

Python 的魔术方法在异步语境里也有对应的异步版本:

python 复制代码
import asyncio

# ---- async with:异步上下文管理器 ----
class AsyncDBConnection:
    async def __aenter__(self):
        print("异步连接数据库")
        await asyncio.sleep(0.1)   # 模拟建立连接耗时
        return self

    async def __aexit__(self, *args):
        print("关闭数据库连接")
        await asyncio.sleep(0.05)

    async def query(self, sql: str):
        await asyncio.sleep(0.1)
        return [{"id": 1, "name": "test"}]

async def fetch_orders():
    async with AsyncDBConnection() as conn:
        result = await conn.query("SELECT * FROM orders")
        return result

# ---- async for:异步迭代器 ----
class AsyncPagedAPI:
    """模拟分页 API,每页需要发一次网络请求"""

    def __init__(self, total_pages: int):
        self.total_pages = total_pages
        self.current_page = 0

    def __aiter__(self):
        return self

    async def __anext__(self):
        if self.current_page >= self.total_pages:
            raise StopAsyncIteration
        self.current_page += 1
        await asyncio.sleep(0.1)  # 模拟网络请求
        return {"page": self.current_page, "data": list(range(10))}

async def fetch_all_pages():
    all_data = []
    async for page in AsyncPagedAPI(total_pages=5):
        print(f"获取第 {page['page']} 页,{len(page['data'])} 条数据")
        all_data.extend(page["data"])
    return all_data

# ---- 异步生成器(更简洁的写法)----
async def async_page_generator(total_pages: int):
    for page in range(1, total_pages + 1):
        await asyncio.sleep(0.1)
        yield {"page": page, "data": list(range(10))}

async def use_async_generator():
    async for page in async_page_generator(5):
        print(f"页 {page['page']}: {len(page['data'])} 条")

asyncio.run(use_async_generator())

3.7 asyncio 实战:并发拉取多数据源

python 复制代码
import asyncio
import aiohttp   # pip install aiohttp
import time
from typing import Optional

DATA_SOURCES = [
    {"name": "淘宝", "url": "https://httpbin.org/delay/1"},
    {"name": "京东", "url": "https://httpbin.org/delay/1"},
    {"name": "拼多多", "url": "https://httpbin.org/delay/1"},
    {"name": "抖音", "url": "https://httpbin.org/delay/1"},
    {"name": "快手", "url": "https://httpbin.org/delay/1"},
]

async def fetch_with_retry(
    session: aiohttp.ClientSession,
    source: dict,
    max_retries: int = 3,
    timeout: float = 5.0
) -> dict:
    """带重试和超时的数据拉取"""
    for attempt in range(max_retries):
        try:
            async with asyncio.timeout(timeout):
                async with session.get(source["url"]) as response:
                    data = await response.json()
                    return {
                        "source": source["name"],
                        "status": response.status,
                        "data": data,
                    }
        except asyncio.TimeoutError:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt   # 指数退避
                print(f"⚠️ {source['name']} 超时,{wait_time}s 后重试(第 {attempt + 1} 次)")
                await asyncio.sleep(wait_time)
            else:
                return {"source": source["name"], "error": "超时,已重试 3 次"}
        except aiohttp.ClientError as e:
            return {"source": source["name"], "error": str(e)}

async def fetch_all_async(
    sources: list,
    concurrency: int = 5   # 控制最大并发数,避免被限速
) -> list:
    """并发拉取所有数据源,带并发限制"""
    semaphore = asyncio.Semaphore(concurrency)
    start = time.time()

    async def fetch_with_limit(session, source):
        async with semaphore:
            return await fetch_with_retry(session, source)

    connector = aiohttp.TCPConnector(limit=concurrency)
    async with aiohttp.ClientSession(connector=connector) as session:
        tasks = [fetch_with_limit(session, source) for source in sources]
        results = await asyncio.gather(*tasks, return_exceptions=True)

    total = round(time.time() - start, 2)
    successes = sum(1 for r in results if isinstance(r, dict) and "error" not in r)
    print(f"\n总耗时:{total}s,成功 {successes}/{len(sources)}")
    return results

if __name__ == "__main__":
    asyncio.run(fetch_all_async(DATA_SOURCES))

3.8 threading vs asyncio 直接对比

维度 threading asyncio
并发模型 多线程,OS 调度切换 单线程,协作式切换
内存开销 每线程约 1-8MB 栈空间 协程极轻量,约 1KB
最大并发数 受系统线程数限制(通常几百) 理论上数万并发
线程安全 需要手动加锁 单线程,await 之间无竞争
遗留同步代码 ✅ 可直接用 ❌ 需改写为 async
调试难度 高(竞争条件难复现) 低(执行顺序可预测)
适用场景 有大量遗留同步库 新项目,高并发 I/O

选哪个的简单判断

  • 用了 requestspymysql 这类同步库 ,短期内不打算重写 → 用 threading
  • 新项目,或者愿意用 aiohttpasyncpg 这类异步库 → 用 asyncio

四、multiprocessing:多进程详解

4.1 为什么多线程和协程都不适合 CPU 密集

来看一个 CPU 密集任务:对大量数据做复杂计算(比如 ETL 中的数据清洗、特征工程)。

python 复制代码
import time
import threading

def cpu_intensive(n: int) -> int:
    """模拟 CPU 密集计算"""
    result = 0
    for i in range(n):
        result += i * i
    return result

# 串行
start = time.time()
for _ in range(4):
    cpu_intensive(10_000_000)
print(f"串行:{time.time() - start:.2f}s")

# 多线程(几乎没有提速,GIL 让 4 个线程无法真正并行)
start = time.time()
threads = [threading.Thread(target=cpu_intensive, args=(10_000_000,)) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(f"多线程:{time.time() - start:.2f}s")  # 和串行差不多,甚至更慢

4.2 多进程:每个进程有独立的 GIL

multiprocessing 绕过 GIL 的方式很直接:开多个进程,每个进程有独立的 Python 解释器和独立的 GIL,真正并行运行在多个 CPU 核心上。
#mermaid-svg-OwKwqxxsVTMB46mS{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-OwKwqxxsVTMB46mS .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-OwKwqxxsVTMB46mS .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-OwKwqxxsVTMB46mS .error-icon{fill:#552222;}#mermaid-svg-OwKwqxxsVTMB46mS .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-OwKwqxxsVTMB46mS .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-OwKwqxxsVTMB46mS .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-OwKwqxxsVTMB46mS .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-OwKwqxxsVTMB46mS .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-OwKwqxxsVTMB46mS .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-OwKwqxxsVTMB46mS .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-OwKwqxxsVTMB46mS .marker{fill:#333333;stroke:#333333;}#mermaid-svg-OwKwqxxsVTMB46mS .marker.cross{stroke:#333333;}#mermaid-svg-OwKwqxxsVTMB46mS svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-OwKwqxxsVTMB46mS p{margin:0;}#mermaid-svg-OwKwqxxsVTMB46mS .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-OwKwqxxsVTMB46mS .cluster-label text{fill:#333;}#mermaid-svg-OwKwqxxsVTMB46mS .cluster-label span{color:#333;}#mermaid-svg-OwKwqxxsVTMB46mS .cluster-label span p{background-color:transparent;}#mermaid-svg-OwKwqxxsVTMB46mS .label text,#mermaid-svg-OwKwqxxsVTMB46mS span{fill:#333;color:#333;}#mermaid-svg-OwKwqxxsVTMB46mS .node rect,#mermaid-svg-OwKwqxxsVTMB46mS .node circle,#mermaid-svg-OwKwqxxsVTMB46mS .node ellipse,#mermaid-svg-OwKwqxxsVTMB46mS .node polygon,#mermaid-svg-OwKwqxxsVTMB46mS .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-OwKwqxxsVTMB46mS .rough-node .label text,#mermaid-svg-OwKwqxxsVTMB46mS .node .label text,#mermaid-svg-OwKwqxxsVTMB46mS .image-shape .label,#mermaid-svg-OwKwqxxsVTMB46mS .icon-shape .label{text-anchor:middle;}#mermaid-svg-OwKwqxxsVTMB46mS .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-OwKwqxxsVTMB46mS .rough-node .label,#mermaid-svg-OwKwqxxsVTMB46mS .node .label,#mermaid-svg-OwKwqxxsVTMB46mS .image-shape .label,#mermaid-svg-OwKwqxxsVTMB46mS .icon-shape .label{text-align:center;}#mermaid-svg-OwKwqxxsVTMB46mS .node.clickable{cursor:pointer;}#mermaid-svg-OwKwqxxsVTMB46mS .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-OwKwqxxsVTMB46mS .arrowheadPath{fill:#333333;}#mermaid-svg-OwKwqxxsVTMB46mS .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-OwKwqxxsVTMB46mS .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-OwKwqxxsVTMB46mS .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-OwKwqxxsVTMB46mS .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-OwKwqxxsVTMB46mS .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-OwKwqxxsVTMB46mS .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-OwKwqxxsVTMB46mS .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-OwKwqxxsVTMB46mS .cluster text{fill:#333;}#mermaid-svg-OwKwqxxsVTMB46mS .cluster span{color:#333;}#mermaid-svg-OwKwqxxsVTMB46mS div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-OwKwqxxsVTMB46mS .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-OwKwqxxsVTMB46mS rect.text{fill:none;stroke-width:0;}#mermaid-svg-OwKwqxxsVTMB46mS .icon-shape,#mermaid-svg-OwKwqxxsVTMB46mS .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-OwKwqxxsVTMB46mS .icon-shape p,#mermaid-svg-OwKwqxxsVTMB46mS .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-OwKwqxxsVTMB46mS .icon-shape .label rect,#mermaid-svg-OwKwqxxsVTMB46mS .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-OwKwqxxsVTMB46mS .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-OwKwqxxsVTMB46mS .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-OwKwqxxsVTMB46mS :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 工作进程池
主进程
主进程

GIL_1
工作进程 1

GIL_2

CPU 核心 1
工作进程 2

GIL_3

CPU 核心 2
工作进程 3

GIL_4

CPU 核心 3
工作进程 4

GIL_5

CPU 核心 4

4.3 Pool:最常用的进程池接口

multiprocessing.Pool 提供了比 ProcessPoolExecutor 更丰富的接口:

python 复制代码
import multiprocessing
import time
import os

def process_chunk(chunk: list) -> dict:
    """处理一批数据,每个工作进程独立执行"""
    result = sum(x ** 2 for x in chunk)
    return {"pid": os.getpid(), "chunk_size": len(chunk), "result": result}

if __name__ == "__main__":
    data = list(range(1_000_000))
    cpu_count = os.cpu_count()

    # 把数据切分成 cpu_count 份
    chunk_size = len(data) // cpu_count
    chunks = [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]

    with multiprocessing.Pool(processes=cpu_count) as pool:

        # map:按顺序返回结果,阻塞直到全部完成
        start = time.time()
        results = pool.map(process_chunk, chunks)
        print(f"map 耗时:{time.time() - start:.2f}s,结果数:{len(results)}")

        # imap:惰性版本,结果一个一个返回,节省内存
        results_iter = pool.imap(process_chunk, chunks)
        for result in results_iter:
            pass  # 可以边处理边接收

        # starmap:任务函数有多个参数时用
        def process_with_config(chunk: list, config: dict) -> dict:
            return {"chunk_size": len(chunk), "config": config}

        task_args = [(chunk, {"debug": True}) for chunk in chunks]
        results = pool.starmap(process_with_config, task_args)

        # apply_async:提交单个任务,非阻塞
        future = pool.apply_async(process_chunk, args=(data[:100],))
        result = future.get(timeout=10)   # 等待结果
        print(f"单任务结果: {result}")

4.4 ProcessPoolExecutor:更现代的接口

concurrent.futures.ProcessPoolExecutorThreadPoolExecutor 接口一致,更推荐用于新代码:

python 复制代码
import os
from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
import time

def process_single_file(file_path: str) -> dict:
    """
    处理单个数据文件的 ETL 逻辑
    每个工作进程独立执行,互不干扰
    """
    start = time.time()
    path = Path(file_path)
    row_count = 0

    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            cleaned = line.strip()
            if cleaned:
                _ = sum(ord(c) ** 2 for c in cleaned[:50])  # 模拟 CPU 计算
                row_count += 1

    return {
        "file": path.name,
        "rows": row_count,
        "elapsed": round(time.time() - start, 3),
        "pid": os.getpid(),
    }

def parallel_etl(file_paths: list[str], max_workers: int = None) -> list[dict]:
    if max_workers is None:
        max_workers = os.cpu_count()

    results = []
    start = time.time()
    print(f"🚀 启动 {max_workers} 个工作进程处理 {len(file_paths)} 个文件...")

    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        future_to_file = {
            executor.submit(process_single_file, fp): fp
            for fp in file_paths
        }
        for future in as_completed(future_to_file):
            try:
                result = future.result()
                results.append(result)
                print(f"✅ {result['file']}(PID={result['pid']},{result['rows']} 行,{result['elapsed']}s)")
            except Exception as e:
                print(f"❌ {future_to_file[future]} 处理失败:{e}")

    print(f"\n总耗时:{round(time.time() - start, 2)}s")
    return results

4.5 进程间通信

多进程的代价是:进程间不共享内存,通信需要序列化(pickle)。
#mermaid-svg-aaDthnoTbRh58jy5{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-aaDthnoTbRh58jy5 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-aaDthnoTbRh58jy5 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-aaDthnoTbRh58jy5 .error-icon{fill:#552222;}#mermaid-svg-aaDthnoTbRh58jy5 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-aaDthnoTbRh58jy5 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-aaDthnoTbRh58jy5 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-aaDthnoTbRh58jy5 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-aaDthnoTbRh58jy5 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-aaDthnoTbRh58jy5 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-aaDthnoTbRh58jy5 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-aaDthnoTbRh58jy5 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-aaDthnoTbRh58jy5 .marker.cross{stroke:#333333;}#mermaid-svg-aaDthnoTbRh58jy5 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-aaDthnoTbRh58jy5 p{margin:0;}#mermaid-svg-aaDthnoTbRh58jy5 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-aaDthnoTbRh58jy5 .cluster-label text{fill:#333;}#mermaid-svg-aaDthnoTbRh58jy5 .cluster-label span{color:#333;}#mermaid-svg-aaDthnoTbRh58jy5 .cluster-label span p{background-color:transparent;}#mermaid-svg-aaDthnoTbRh58jy5 .label text,#mermaid-svg-aaDthnoTbRh58jy5 span{fill:#333;color:#333;}#mermaid-svg-aaDthnoTbRh58jy5 .node rect,#mermaid-svg-aaDthnoTbRh58jy5 .node circle,#mermaid-svg-aaDthnoTbRh58jy5 .node ellipse,#mermaid-svg-aaDthnoTbRh58jy5 .node polygon,#mermaid-svg-aaDthnoTbRh58jy5 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-aaDthnoTbRh58jy5 .rough-node .label text,#mermaid-svg-aaDthnoTbRh58jy5 .node .label text,#mermaid-svg-aaDthnoTbRh58jy5 .image-shape .label,#mermaid-svg-aaDthnoTbRh58jy5 .icon-shape .label{text-anchor:middle;}#mermaid-svg-aaDthnoTbRh58jy5 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-aaDthnoTbRh58jy5 .rough-node .label,#mermaid-svg-aaDthnoTbRh58jy5 .node .label,#mermaid-svg-aaDthnoTbRh58jy5 .image-shape .label,#mermaid-svg-aaDthnoTbRh58jy5 .icon-shape .label{text-align:center;}#mermaid-svg-aaDthnoTbRh58jy5 .node.clickable{cursor:pointer;}#mermaid-svg-aaDthnoTbRh58jy5 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-aaDthnoTbRh58jy5 .arrowheadPath{fill:#333333;}#mermaid-svg-aaDthnoTbRh58jy5 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-aaDthnoTbRh58jy5 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-aaDthnoTbRh58jy5 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-aaDthnoTbRh58jy5 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-aaDthnoTbRh58jy5 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-aaDthnoTbRh58jy5 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-aaDthnoTbRh58jy5 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-aaDthnoTbRh58jy5 .cluster text{fill:#333;}#mermaid-svg-aaDthnoTbRh58jy5 .cluster span{color:#333;}#mermaid-svg-aaDthnoTbRh58jy5 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-aaDthnoTbRh58jy5 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-aaDthnoTbRh58jy5 rect.text{fill:none;stroke-width:0;}#mermaid-svg-aaDthnoTbRh58jy5 .icon-shape,#mermaid-svg-aaDthnoTbRh58jy5 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-aaDthnoTbRh58jy5 .icon-shape p,#mermaid-svg-aaDthnoTbRh58jy5 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-aaDthnoTbRh58jy5 .icon-shape .label rect,#mermaid-svg-aaDthnoTbRh58jy5 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-aaDthnoTbRh58jy5 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-aaDthnoTbRh58jy5 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-aaDthnoTbRh58jy5 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} pickle 序列化
pickle 反序列化
进程 1
Queue / Pipe
进程 2

python 复制代码
from multiprocessing import Process, Queue, Pipe, Value, Array
import ctypes

# ---- Queue:最通用,多对多 ----
def worker_queue(q: Queue, data: list):
    result = sum(x ** 2 for x in data)
    q.put({"result": result, "count": len(data)})

q = Queue()
p = Process(target=worker_queue, args=(q, list(range(1000))))
p.start()
p.join()
print(q.get())  # {"result": ..., "count": 1000}

# ---- Pipe:点对点,比 Queue 快 ----
def worker_pipe(conn):
    data = conn.recv()        # 接收数据
    result = sum(x ** 2 for x in data)
    conn.send(result)         # 发送结果
    conn.close()

parent_conn, child_conn = Pipe()
p = Process(target=worker_pipe, args=(child_conn,))
p.start()
parent_conn.send(list(range(1000)))   # 发送数据给子进程
result = parent_conn.recv()            # 接收结果
p.join()

# ---- 共享内存:大数据量,避免 pickle 开销 ----
# 方式一:Value 和 Array(基础类型)
shared_counter = Value(ctypes.c_int, 0)
shared_array = Array(ctypes.c_double, [1.0, 2.0, 3.0])

# 方式二:shared_memory(Python 3.8+,适合 numpy 数组)
from multiprocessing import shared_memory
import numpy as np

# 主进程创建共享内存
shm = shared_memory.SharedMemory(create=True, size=1024 * 1024 * 10)  # 10MB
np_array = np.ndarray((1000, 1000), dtype=np.float64, buffer=shm.buf)
np_array[:] = np.random.rand(1000, 1000)   # 写入数据

def worker_shared_mem(shm_name: str, shape: tuple):
    """工作进程通过名字找到共享内存,零拷贝读取"""
    existing_shm = shared_memory.SharedMemory(name=shm_name)
    array = np.ndarray(shape, dtype=np.float64, buffer=existing_shm.buf)
    result = np.sum(array)   # 直接读取,不需要 pickle 传输
    existing_shm.close()
    return result

# 主进程把共享内存名字传给子进程(只传名字,不传数据)
with ProcessPoolExecutor(max_workers=4) as executor:
    future = executor.submit(worker_shared_mem, shm.name, (1000, 1000))
    print(f"共享内存计算结果:{future.result():.2f}")

# 清理
shm.close()
shm.unlink()  # 删除共享内存

4.6 进程池的注意事项

python 复制代码
from concurrent.futures import ProcessPoolExecutor
import multiprocessing

# ---- 注意 1:任务函数必须可 pickle ----

# ❌ 糟糕:lambda 不可 pickle
with ProcessPoolExecutor() as executor:
    executor.submit(lambda x: x**2, 5)  # PicklingError!

# ✅ 正确:用普通顶层函数
def square(x):
    return x ** 2

with ProcessPoolExecutor() as executor:
    executor.submit(square, 5)

# ❌ 糟糕:在实例方法里提交自身(self 不一定可 pickle)
class Processor:
    def __init__(self, config):
        self.config = config
        self.db = create_connection()  # 不可 pickle

    def run(self, data):
        with ProcessPoolExecutor() as executor:
            executor.submit(self.process, data)  # self 里有 db,报错!

    def process(self, data):
        return data

# ✅ 正确:传可 pickle 的数据,在工作进程里重新初始化连接
def process_with_config(data, config: dict):
    conn = create_connection(config)   # 每个工作进程自己建连接
    result = conn.process(data)
    conn.close()
    return result

# ---- 注意 2:进程启动方式 ----
# macOS Python 3.12+ / Windows 默认 spawn,Linux 默认 fork
# spawn 更安全(不复制父进程状态),但启动慢
# fork 快,但会复制锁、文件句柄等,可能死锁

if __name__ == "__main__":   # spawn 模式必须有这个保护!
    ctx = multiprocessing.get_context("spawn")
    with ctx.Pool(4) as pool:
        results = pool.map(square, range(10))

# ---- 注意 3:进程数不是越多越好 ----
import os

# CPU 密集:进程数 = CPU 核心数(最优)
cpu_workers = os.cpu_count()

# 超过核心数反而因为进程切换开销变慢
# ❌ 糟糕
with ProcessPoolExecutor(max_workers=100) as executor:  # 8 核机器开 100 进程,浪费
    pass

# ✅ 正确
with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
    pass

五、混合模式与选型决策

5.1 asyncio + ProcessPoolExecutor:异步协调 + CPU 并行

真实的数据管道往往同时有 I/O 密集和 CPU 密集两种操作:
#mermaid-svg-p9DbzqvhI669S3DY{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-p9DbzqvhI669S3DY .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-p9DbzqvhI669S3DY .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-p9DbzqvhI669S3DY .error-icon{fill:#552222;}#mermaid-svg-p9DbzqvhI669S3DY .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-p9DbzqvhI669S3DY .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-p9DbzqvhI669S3DY .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-p9DbzqvhI669S3DY .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-p9DbzqvhI669S3DY .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-p9DbzqvhI669S3DY .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-p9DbzqvhI669S3DY .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-p9DbzqvhI669S3DY .marker{fill:#333333;stroke:#333333;}#mermaid-svg-p9DbzqvhI669S3DY .marker.cross{stroke:#333333;}#mermaid-svg-p9DbzqvhI669S3DY svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-p9DbzqvhI669S3DY p{margin:0;}#mermaid-svg-p9DbzqvhI669S3DY .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-p9DbzqvhI669S3DY .cluster-label text{fill:#333;}#mermaid-svg-p9DbzqvhI669S3DY .cluster-label span{color:#333;}#mermaid-svg-p9DbzqvhI669S3DY .cluster-label span p{background-color:transparent;}#mermaid-svg-p9DbzqvhI669S3DY .label text,#mermaid-svg-p9DbzqvhI669S3DY span{fill:#333;color:#333;}#mermaid-svg-p9DbzqvhI669S3DY .node rect,#mermaid-svg-p9DbzqvhI669S3DY .node circle,#mermaid-svg-p9DbzqvhI669S3DY .node ellipse,#mermaid-svg-p9DbzqvhI669S3DY .node polygon,#mermaid-svg-p9DbzqvhI669S3DY .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-p9DbzqvhI669S3DY .rough-node .label text,#mermaid-svg-p9DbzqvhI669S3DY .node .label text,#mermaid-svg-p9DbzqvhI669S3DY .image-shape .label,#mermaid-svg-p9DbzqvhI669S3DY .icon-shape .label{text-anchor:middle;}#mermaid-svg-p9DbzqvhI669S3DY .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-p9DbzqvhI669S3DY .rough-node .label,#mermaid-svg-p9DbzqvhI669S3DY .node .label,#mermaid-svg-p9DbzqvhI669S3DY .image-shape .label,#mermaid-svg-p9DbzqvhI669S3DY .icon-shape .label{text-align:center;}#mermaid-svg-p9DbzqvhI669S3DY .node.clickable{cursor:pointer;}#mermaid-svg-p9DbzqvhI669S3DY .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-p9DbzqvhI669S3DY .arrowheadPath{fill:#333333;}#mermaid-svg-p9DbzqvhI669S3DY .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-p9DbzqvhI669S3DY .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-p9DbzqvhI669S3DY .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-p9DbzqvhI669S3DY .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-p9DbzqvhI669S3DY .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-p9DbzqvhI669S3DY .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-p9DbzqvhI669S3DY .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-p9DbzqvhI669S3DY .cluster text{fill:#333;}#mermaid-svg-p9DbzqvhI669S3DY .cluster span{color:#333;}#mermaid-svg-p9DbzqvhI669S3DY div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-p9DbzqvhI669S3DY .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-p9DbzqvhI669S3DY rect.text{fill:none;stroke-width:0;}#mermaid-svg-p9DbzqvhI669S3DY .icon-shape,#mermaid-svg-p9DbzqvhI669S3DY .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-p9DbzqvhI669S3DY .icon-shape p,#mermaid-svg-p9DbzqvhI669S3DY .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-p9DbzqvhI669S3DY .icon-shape .label rect,#mermaid-svg-p9DbzqvhI669S3DY .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-p9DbzqvhI669S3DY .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-p9DbzqvhI669S3DY .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-p9DbzqvhI669S3DY :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} asyncio 并发拉取

(I/O 密集)
ProcessPool 并行清洗

(CPU 密集)
asyncio 并发写入

(I/O 密集)
多个数据源
原始数据
清洗后数据
数据仓库

python 复制代码
import asyncio
import aiohttp
from concurrent.futures import ProcessPoolExecutor
import os

# CPU 密集:数据清洗(运行在工作进程中)
def clean_and_transform(raw_data: dict) -> dict:
    """这个函数跑在独立进程里,可以充分利用多核"""
    source = raw_data["source"]
    records = raw_data.get("records", [])

    cleaned = []
    for record in records:
        # 复杂计算:去重、格式转换、异常值处理
        processed = {
            "id": record.get("id"),
            "value": sum(ord(c) for c in str(record)) % 1000,
            "source": source,
        }
        cleaned.append(processed)

    return {"source": source, "count": len(cleaned), "data": cleaned}

# I/O 密集:异步拉取数据(运行在 Event Loop 的主线程)
async def fetch_data(session: aiohttp.ClientSession, source: dict) -> dict:
    async with session.get(source["url"]) as resp:
        return {
            "source": source["name"],
            "records": [{"id": i, "val": i * 2} for i in range(100)],
        }

async def data_pipeline(sources: list) -> list:
    """
    完整数据管道:
    1. asyncio 并发拉取多个数据源(I/O 密集)
    2. ProcessPoolExecutor 并行清洗数据(CPU 密集)
    """
    loop = asyncio.get_event_loop()

    # 第一阶段:并发拉取(I/O 密集)
    print("📥 开始并发拉取数据...")
    async with aiohttp.ClientSession() as session:
        fetch_tasks = [fetch_data(session, s) for s in sources]
        raw_results = await asyncio.gather(*fetch_tasks)
    print(f"✅ 拉取完成,共 {len(raw_results)} 个数据源")

    # 第二阶段:并行清洗(CPU 密集)
    print("⚙️ 开始并行清洗数据...")
    with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
        # run_in_executor 把同步函数提交给进程池,返回可 await 的 Future
        clean_tasks = [
            loop.run_in_executor(executor, clean_and_transform, raw)
            for raw in raw_results
        ]
        cleaned_results = await asyncio.gather(*clean_tasks)

    total_rows = sum(r["count"] for r in cleaned_results)
    print(f"✅ 清洗完成,共处理 {total_rows} 条记录")
    return cleaned_results

if __name__ == "__main__":
    sources = [
        {"name": f"数据源_{i}", "url": "https://httpbin.org/json"}
        for i in range(5)
    ]
    asyncio.run(data_pipeline(sources))

💡 loop.run_in_executor(executor, func, *args) 是连接 asyncio 和进程/线程池的桥梁。它把同步函数提交给执行器异步运行,返回可 await 的 Future,Event Loop 不会被阻塞。

5.2 threading + asyncio:在异步代码里运行同步阻塞任务

有时候无法改写同步代码(遗留库、第三方 SDK),但又想在异步代码里调用:

python 复制代码
import asyncio
import requests  # 同步库

async def fetch_with_sync_lib(url: str) -> str:
    """在 asyncio 里安全地调用同步阻塞函数"""
    loop = asyncio.get_event_loop()

    # run_in_executor 默认使用线程池,不会阻塞 Event Loop
    response = await loop.run_in_executor(
        None,          # None 表示用默认线程池
        requests.get,  # 同步函数
        url            # 参数
    )
    return response.text

# 多个同步调用并发执行
async def fetch_multiple(urls: list) -> list:
    loop = asyncio.get_event_loop()
    tasks = [
        loop.run_in_executor(None, requests.get, url)
        for url in urls
    ]
    responses = await asyncio.gather(*tasks)
    return [r.text for r in responses]

asyncio.run(fetch_multiple(["https://httpbin.org/get"] * 5))

5.3 选型决策树

#mermaid-svg-MCcUyfBPXAJTiv52{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-MCcUyfBPXAJTiv52 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-MCcUyfBPXAJTiv52 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-MCcUyfBPXAJTiv52 .error-icon{fill:#552222;}#mermaid-svg-MCcUyfBPXAJTiv52 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-MCcUyfBPXAJTiv52 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-MCcUyfBPXAJTiv52 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-MCcUyfBPXAJTiv52 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-MCcUyfBPXAJTiv52 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-MCcUyfBPXAJTiv52 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-MCcUyfBPXAJTiv52 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-MCcUyfBPXAJTiv52 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-MCcUyfBPXAJTiv52 .marker.cross{stroke:#333333;}#mermaid-svg-MCcUyfBPXAJTiv52 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-MCcUyfBPXAJTiv52 p{margin:0;}#mermaid-svg-MCcUyfBPXAJTiv52 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-MCcUyfBPXAJTiv52 .cluster-label text{fill:#333;}#mermaid-svg-MCcUyfBPXAJTiv52 .cluster-label span{color:#333;}#mermaid-svg-MCcUyfBPXAJTiv52 .cluster-label span p{background-color:transparent;}#mermaid-svg-MCcUyfBPXAJTiv52 .label text,#mermaid-svg-MCcUyfBPXAJTiv52 span{fill:#333;color:#333;}#mermaid-svg-MCcUyfBPXAJTiv52 .node rect,#mermaid-svg-MCcUyfBPXAJTiv52 .node circle,#mermaid-svg-MCcUyfBPXAJTiv52 .node ellipse,#mermaid-svg-MCcUyfBPXAJTiv52 .node polygon,#mermaid-svg-MCcUyfBPXAJTiv52 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-MCcUyfBPXAJTiv52 .rough-node .label text,#mermaid-svg-MCcUyfBPXAJTiv52 .node .label text,#mermaid-svg-MCcUyfBPXAJTiv52 .image-shape .label,#mermaid-svg-MCcUyfBPXAJTiv52 .icon-shape .label{text-anchor:middle;}#mermaid-svg-MCcUyfBPXAJTiv52 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-MCcUyfBPXAJTiv52 .rough-node .label,#mermaid-svg-MCcUyfBPXAJTiv52 .node .label,#mermaid-svg-MCcUyfBPXAJTiv52 .image-shape .label,#mermaid-svg-MCcUyfBPXAJTiv52 .icon-shape .label{text-align:center;}#mermaid-svg-MCcUyfBPXAJTiv52 .node.clickable{cursor:pointer;}#mermaid-svg-MCcUyfBPXAJTiv52 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-MCcUyfBPXAJTiv52 .arrowheadPath{fill:#333333;}#mermaid-svg-MCcUyfBPXAJTiv52 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-MCcUyfBPXAJTiv52 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-MCcUyfBPXAJTiv52 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MCcUyfBPXAJTiv52 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-MCcUyfBPXAJTiv52 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MCcUyfBPXAJTiv52 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-MCcUyfBPXAJTiv52 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-MCcUyfBPXAJTiv52 .cluster text{fill:#333;}#mermaid-svg-MCcUyfBPXAJTiv52 .cluster span{color:#333;}#mermaid-svg-MCcUyfBPXAJTiv52 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-MCcUyfBPXAJTiv52 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-MCcUyfBPXAJTiv52 rect.text{fill:none;stroke-width:0;}#mermaid-svg-MCcUyfBPXAJTiv52 .icon-shape,#mermaid-svg-MCcUyfBPXAJTiv52 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MCcUyfBPXAJTiv52 .icon-shape p,#mermaid-svg-MCcUyfBPXAJTiv52 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-MCcUyfBPXAJTiv52 .icon-shape .label rect,#mermaid-svg-MCcUyfBPXAJTiv52 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MCcUyfBPXAJTiv52 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-MCcUyfBPXAJTiv52 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-MCcUyfBPXAJTiv52 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 是
否,新项目
任务是什么类型?
I/O 密集

网络/文件/数据库
CPU 密集

数值计算/数据处理
有大量遗留同步代码?

requests/pymysql 等
threading

ThreadPoolExecutor
asyncio

aiohttp/asyncpg
同时有 I/O + CPU?
asyncio + ProcessPoolExecutor

混合模式
纯 Python 计算
multiprocessing

ProcessPoolExecutor
NumPy/SciPy 计算
NumPy C 层释放 GIL

ThreadPoolExecutor 也有效

5.4 性能参考对比

以"并发处理 10 个各耗时 1 秒的 I/O 任务"为例:

方案 理论耗时 内存开销 适合任务数
串行 ~10s 极低 不适合并发
threading(10 线程) ~1s 中(~80MB) 数十到数百
asyncio ~1s 极低(<5MB) 数百到数万
multiprocessing(4 进程) ~3s 高(进程启动 50-100ms/个) 不适合 I/O 密集

以"并行处理 4 个各需 2 秒的 CPU 密集任务,8 核机器"为例:

方案 理论耗时 说明
串行 ~8s 基准
threading(4 线程) ~8s GIL 导致无效
multiprocessing(4 进程) ~2s 真正并行,约 4x 加速
asyncio ~8s 单线程,CPU 密集无效

📊 实际测试建议:进程启动有固定开销(约 50-100ms),任务量太小时多进程反而更慢。经验值:单个任务执行时间 > 200ms,多进程才值得。


六、常见坑与最佳实践

坑 1:在 async def 里调用同步阻塞 I/O

python 复制代码
import asyncio
import requests

# ❌ 糟糕:阻塞整个 Event Loop,所有协程被卡死
async def bad_fetch(url: str):
    response = requests.get(url)  # 同步阻塞!Event Loop 无法调度其他协程
    return response.text

# ✅ 正确方案一:换用异步库
import aiohttp
async def good_fetch_aiohttp(url: str):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            return await resp.text()

# ✅ 正确方案二:用 run_in_executor 把同步调用丢到线程池
async def good_fetch_executor(url: str):
    loop = asyncio.get_event_loop()
    response = await loop.run_in_executor(None, requests.get, url)
    return response.text

坑 2:asyncio.sleep vs time.sleep

python 复制代码
import asyncio, time

# ❌ 糟糕:time.sleep 是同步阻塞,卡死 Event Loop
async def bad_wait():
    time.sleep(1)  # 整个 Event Loop 被冻结 1 秒,其他协程无法运行

# ✅ 正确:asyncio.sleep 会让出控制权
async def good_wait():
    await asyncio.sleep(1)  # Event Loop 可以在这 1 秒里调度其他协程

坑 3:忘记 await 导致协程不执行

python 复制代码
# ❌ 糟糕:忘记 await,协程对象被创建但从未执行
async def main():
    result = fetch_data()  # 只创建了协程对象!Python 3.10+ 会发 RuntimeWarning
    print(result)          # 打印的是 <coroutine object fetch_data at 0x...>

# ✅ 正确
async def main():
    result = await fetch_data()
    print(result)

坑 4:create_task 在协程被 await 前就取消

python 复制代码
import asyncio

# ❌ 糟糕:task 创建后没有被引用,可能被垃圾回收
async def bad():
    asyncio.create_task(some_coroutine())  # 没有保留引用!

# ✅ 正确:保存 task 引用,或用 TaskGroup 管理
background_tasks = set()

async def good():
    task = asyncio.create_task(some_coroutine())
    background_tasks.add(task)                 # 防止被 GC
    task.add_done_callback(background_tasks.discard)

坑 5:ProcessPoolExecutor 的 pickle 限制

python 复制代码
from concurrent.futures import ProcessPoolExecutor

class DataProcessor:
    def __init__(self, config):
        self.config = config
        self.db_conn = create_db_connection()  # ❌ 数据库连接不能 pickle!

    def process(self, data):
        return self.db_conn.query(data)

processor = DataProcessor(config)

# ❌ 这会报 PicklingError,因为 db_conn 不可序列化
with ProcessPoolExecutor() as executor:
    executor.submit(processor.process, data)

# ✅ 正确:在工作进程内部创建连接
def process_with_connection(data, config):
    conn = create_db_connection(config)  # 每个工作进程自己建连接
    result = conn.query(data)
    conn.close()
    return result

with ProcessPoolExecutor() as executor:
    executor.submit(process_with_connection, data, config)

坑 6:Lock 顺序不一致导致死锁

python 复制代码
import threading

lock_a = threading.Lock()
lock_b = threading.Lock()

# ❌ 糟糕:两个线程以相反顺序获取锁,必然死锁
def thread1():
    with lock_a:
        with lock_b:   # 等 lock_b
            pass

def thread2():
    with lock_b:
        with lock_a:   # 等 lock_a(而 lock_a 被 thread1 持有)
            pass

# ✅ 正确:始终以相同顺序获取锁
def thread1_safe():
    with lock_a:
        with lock_b:
            pass

def thread2_safe():
    with lock_a:   # 和 thread1_safe 一样的顺序
        with lock_b:
            pass

坑 7:多进程的 fork 安全问题

python 复制代码
import multiprocessing

# ❌ 在 macOS/Linux 上,fork 会复制父进程的状态
# 如果父进程持有锁(数据库连接池、日志锁等),子进程继承了锁但永远无法释放 → 死锁

# ✅ 显式指定启动方式(跨平台一致性)
if __name__ == "__main__":
    multiprocessing.set_start_method("spawn")  # 安全,但比 fork 慢(每次重新导入模块)
    # 或在 Pool/ProcessPoolExecutor 中单独指定
    ctx = multiprocessing.get_context("spawn")
    with ctx.Pool(4) as pool:
        results = pool.map(process_func, data_list)

坑 8:线程数/进程数不是越多越好

python 复制代码
import os
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

# ❌ 糟糕:盲目开大量进程/线程
with ProcessPoolExecutor(max_workers=100) as executor:  # 8 核机器用 100 进程,大量时间在切换
    pass
with ThreadPoolExecutor(max_workers=1000) as executor:  # 1000 线程,内存和调度开销巨大
    pass

# ✅ CPU 密集:进程数 = CPU 核心数
with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
    pass

# ✅ I/O 密集:线程数可以多一些,但也有上限
# ThreadPoolExecutor 默认值是 min(32, os.cpu_count() + 4)
io_workers = min(32, os.cpu_count() * 4)   # 经验值
with ThreadPoolExecutor(max_workers=io_workers) as executor:
    pass

七、Python 3.13 的无 GIL 模式(展望)

Python 3.13 引入了实验性的自由线程(Free-Threaded)模式 ,通过编译时选项 --disable-gil 移除 GIL:

bash 复制代码
# 安装支持自由线程的 Python 3.13(实验性)
# pyenv install 3.13t  (t 表示 free-threaded)
python 复制代码
import sys

# 检查是否在自由线程模式下运行
print(sys._is_gil_enabled())  # Python 3.13+ 可用

# 自由线程模式下,多线程 CPU 密集任务可以真正并行
import threading
import time

results = []
def cpu_task():
    result = sum(i * i for i in range(10_000_000))
    results.append(result)

# 在自由线程模式下,这段代码可以充分利用多核
start = time.time()
threads = [threading.Thread(target=cpu_task) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(f"耗时:{time.time() - start:.2f}s")  # 自由线程模式下约为串行的 1/4

⚠️ 注意:自由线程模式目前(3.13)是实验性的,有以下已知问题:

  • 单线程性能下降约 5-10%(细粒度锁的开销)
  • 部分 C 扩展尚未兼容
  • 生产环境不推荐使用

PEP 703 的目标是在未来版本(3.14/3.15)让自由线程成为默认模式。这将是 Python 并发编程的重大转折点。


八、综合实战:构建一个完整的数据采集管道

把所有知识综合起来,实现一个接近生产的多数据源采集系统:

python 复制代码
"""
场景:电商 BI 系统的数据采集管道
1. 从 5 个渠道并发拉取当日订单数据(asyncio,I/O 密集)
2. 对每个渠道的数据做并行清洗和特征提取(ProcessPool,CPU 密集)
3. 用信号量限制并发数,避免打爆下游 API
4. 所有错误单独记录,不影响其他渠道
"""
import asyncio
import os
import time
from concurrent.futures import ProcessPoolExecutor
from dataclasses import dataclass, field
from typing import Optional
import random

# ---- 数据模型 ----
@dataclass
class RawOrder:
    order_id: str
    channel: str
    amount: float
    items: list

@dataclass
class PipelineResult:
    channel: str
    success: bool
    record_count: int = 0
    error: Optional[str] = None
    elapsed: float = 0.0

# ---- CPU 密集:数据清洗(工作进程) ----
def clean_orders(raw_orders: list[dict], channel: str) -> dict:
    """
    数据清洗:去重、格式统一、异常值过滤
    这里是 CPU 密集操作,跑在独立进程里
    """
    seen_ids = set()
    cleaned = []
    for order in raw_orders:
        # 去重
        if order["order_id"] in seen_ids:
            continue
        seen_ids.add(order["order_id"])

        # 异常值过滤
        if order["amount"] <= 0 or order["amount"] > 100000:
            continue

        # 格式转换(模拟 CPU 计算)
        features = {
            "order_id": order["order_id"],
            "channel": channel,
            "amount": round(order["amount"], 2),
            "item_count": len(order.get("items", [])),
            "amount_hash": hash(str(order["amount"])) % 10000,  # 模拟特征提取
        }
        cleaned.append(features)

    return {
        "channel": channel,
        "original_count": len(raw_orders),
        "cleaned_count": len(cleaned),
        "data": cleaned,
    }

# ---- I/O 密集:数据拉取(协程) ----
async def fetch_channel_data(
    channel: str,
    semaphore: asyncio.Semaphore,
    timeout: float = 10.0
) -> Optional[list[dict]]:
    """
    模拟从某个电商渠道拉取数据
    实际场景替换为 aiohttp 调用
    """
    async with semaphore:
        try:
            async with asyncio.timeout(timeout):
                # 模拟网络延迟
                await asyncio.sleep(random.uniform(0.5, 2.0))

                # 模拟偶发失败
                if random.random() < 0.1:
                    raise ConnectionError(f"{channel} API 返回 500")

                # 模拟返回订单数据
                orders = [
                    {
                        "order_id": f"{channel}-{i:06d}",
                        "channel": channel,
                        "amount": random.uniform(10, 5000),
                        "items": [f"item_{j}" for j in range(random.randint(1, 5))],
                    }
                    for i in range(random.randint(100, 1000))
                ]
                return orders

        except asyncio.TimeoutError:
            raise TimeoutError(f"{channel} 请求超时")

# ---- 主管道 ----
async def run_pipeline(channels: list[str]) -> list[PipelineResult]:
    loop = asyncio.get_event_loop()
    results = []
    semaphore = asyncio.Semaphore(3)  # 最多同时 3 个并发请求
    start = time.time()

    # 第一阶段:并发拉取所有渠道数据
    print(f"📥 开始拉取 {len(channels)} 个渠道的数据...")
    fetch_tasks = {
        channel: asyncio.create_task(fetch_channel_data(channel, semaphore))
        for channel in channels
    }

    raw_data_map: dict[str, Optional[list]] = {}
    for channel, task in fetch_tasks.items():
        try:
            raw_data_map[channel] = await task
            print(f"  ✅ {channel}: 拉取 {len(raw_data_map[channel])} 条")
        except Exception as e:
            raw_data_map[channel] = None
            results.append(PipelineResult(channel=channel, success=False, error=str(e)))
            print(f"  ❌ {channel}: {e}")

    # 过滤掉失败的渠道
    successful_data = {k: v for k, v in raw_data_map.items() if v is not None}
    print(f"\n⚙️ 开始并行清洗 {len(successful_data)} 个渠道的数据...")

    # 第二阶段:并行清洗(CPU 密集)
    with ProcessPoolExecutor(max_workers=min(os.cpu_count(), len(successful_data))) as executor:
        clean_tasks = [
            loop.run_in_executor(executor, clean_orders, raw_data, channel)
            for channel, raw_data in successful_data.items()
        ]
        cleaned_results = await asyncio.gather(*clean_tasks, return_exceptions=True)

    for result in cleaned_results:
        if isinstance(result, Exception):
            print(f"  ❌ 清洗失败: {result}")
        else:
            ratio = result["cleaned_count"] / result["original_count"] * 100
            print(f"  ✅ {result['channel']}: {result['original_count']} → {result['cleaned_count']} 条(保留 {ratio:.0f}%)")
            results.append(PipelineResult(
                channel=result["channel"],
                success=True,
                record_count=result["cleaned_count"],
                elapsed=round(time.time() - start, 2),
            ))

    total = round(time.time() - start, 2)
    success_count = sum(1 for r in results if r.success)
    print(f"\n🎉 管道完成!总耗时 {total}s,成功 {success_count}/{len(channels)} 个渠道")
    return results

if __name__ == "__main__":
    channels = ["淘宝", "京东", "拼多多", "抖音", "快手", "唯品会", "苏宁"]
    asyncio.run(run_pipeline(channels))

九、总结

#mermaid-svg-nttrDgqoPdk6tSPH{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-nttrDgqoPdk6tSPH .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-nttrDgqoPdk6tSPH .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-nttrDgqoPdk6tSPH .error-icon{fill:#552222;}#mermaid-svg-nttrDgqoPdk6tSPH .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-nttrDgqoPdk6tSPH .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-nttrDgqoPdk6tSPH .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-nttrDgqoPdk6tSPH .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-nttrDgqoPdk6tSPH .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-nttrDgqoPdk6tSPH .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-nttrDgqoPdk6tSPH .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-nttrDgqoPdk6tSPH .marker{fill:#333333;stroke:#333333;}#mermaid-svg-nttrDgqoPdk6tSPH .marker.cross{stroke:#333333;}#mermaid-svg-nttrDgqoPdk6tSPH svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-nttrDgqoPdk6tSPH p{margin:0;}#mermaid-svg-nttrDgqoPdk6tSPH .edge{stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .section--1 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section--1 path,#mermaid-svg-nttrDgqoPdk6tSPH .section--1 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section--1 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section--1 path{fill:hsl(240, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section--1 text{fill:#ffffff;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon--1{font-size:40px;color:#ffffff;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge--1{stroke:hsl(240, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth--1{stroke-width:17;}#mermaid-svg-nttrDgqoPdk6tSPH .section--1 line{stroke:hsl(60, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-0 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-0 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-0 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-0 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-0 path{fill:hsl(60, 100%, 73.5294117647%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-0 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-0{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-0{stroke:hsl(60, 100%, 73.5294117647%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-0{stroke-width:14;}#mermaid-svg-nttrDgqoPdk6tSPH .section-0 line{stroke:hsl(240, 100%, 83.5294117647%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-1 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-1 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-1 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-1 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-1 path{fill:hsl(80, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-1 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-1{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-1{stroke:hsl(80, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-1{stroke-width:11;}#mermaid-svg-nttrDgqoPdk6tSPH .section-1 line{stroke:hsl(260, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-2 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-2 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-2 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-2 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-2 path{fill:hsl(270, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-2 text{fill:#ffffff;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-2{font-size:40px;color:#ffffff;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-2{stroke:hsl(270, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-2{stroke-width:8;}#mermaid-svg-nttrDgqoPdk6tSPH .section-2 line{stroke:hsl(90, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-3 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-3 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-3 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-3 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-3 path{fill:hsl(300, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-3 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-3{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-3{stroke:hsl(300, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-3{stroke-width:5;}#mermaid-svg-nttrDgqoPdk6tSPH .section-3 line{stroke:hsl(120, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-4 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-4 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-4 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-4 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-4 path{fill:hsl(330, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-4 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-4{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-4{stroke:hsl(330, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-4{stroke-width:2;}#mermaid-svg-nttrDgqoPdk6tSPH .section-4 line{stroke:hsl(150, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-5 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-5 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-5 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-5 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-5 path{fill:hsl(0, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-5 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-5{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-5{stroke:hsl(0, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-5{stroke-width:-1;}#mermaid-svg-nttrDgqoPdk6tSPH .section-5 line{stroke:hsl(180, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-6 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-6 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-6 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-6 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-6 path{fill:hsl(30, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-6 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-6{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-6{stroke:hsl(30, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-6{stroke-width:-4;}#mermaid-svg-nttrDgqoPdk6tSPH .section-6 line{stroke:hsl(210, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-7 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-7 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-7 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-7 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-7 path{fill:hsl(90, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-7 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-7{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-7{stroke:hsl(90, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-7{stroke-width:-7;}#mermaid-svg-nttrDgqoPdk6tSPH .section-7 line{stroke:hsl(270, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-8 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-8 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-8 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-8 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-8 path{fill:hsl(150, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-8 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-8{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-8{stroke:hsl(150, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-8{stroke-width:-10;}#mermaid-svg-nttrDgqoPdk6tSPH .section-8 line{stroke:hsl(330, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-9 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-9 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-9 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-9 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-9 path{fill:hsl(180, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-9 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-9{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-9{stroke:hsl(180, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-9{stroke-width:-13;}#mermaid-svg-nttrDgqoPdk6tSPH .section-9 line{stroke:hsl(0, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-10 rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-10 path,#mermaid-svg-nttrDgqoPdk6tSPH .section-10 circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-10 polygon,#mermaid-svg-nttrDgqoPdk6tSPH .section-10 path{fill:hsl(210, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-10 text{fill:black;}#mermaid-svg-nttrDgqoPdk6tSPH .node-icon-10{font-size:40px;color:black;}#mermaid-svg-nttrDgqoPdk6tSPH .section-edge-10{stroke:hsl(210, 100%, 76.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .edge-depth-10{stroke-width:-16;}#mermaid-svg-nttrDgqoPdk6tSPH .section-10 line{stroke:hsl(30, 100%, 86.2745098039%);stroke-width:3;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled,#mermaid-svg-nttrDgqoPdk6tSPH .disabled circle,#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:lightgray;}#mermaid-svg-nttrDgqoPdk6tSPH .disabled text{fill:#efefef;}#mermaid-svg-nttrDgqoPdk6tSPH .section-root rect,#mermaid-svg-nttrDgqoPdk6tSPH .section-root path,#mermaid-svg-nttrDgqoPdk6tSPH .section-root circle,#mermaid-svg-nttrDgqoPdk6tSPH .section-root polygon{fill:hsl(240, 100%, 46.2745098039%);}#mermaid-svg-nttrDgqoPdk6tSPH .section-root text{fill:#ffffff;}#mermaid-svg-nttrDgqoPdk6tSPH .section-root span{color:#ffffff;}#mermaid-svg-nttrDgqoPdk6tSPH .section-2 span{color:#ffffff;}#mermaid-svg-nttrDgqoPdk6tSPH .icon-container{height:100%;display:flex;justify-content:center;align-items:center;}#mermaid-svg-nttrDgqoPdk6tSPH .edge{fill:none;}#mermaid-svg-nttrDgqoPdk6tSPH .mindmap-node-label{dy:1em;alignment-baseline:middle;text-anchor:middle;dominant-baseline:middle;text-align:center;}#mermaid-svg-nttrDgqoPdk6tSPH :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Python 并发
GIL
引用计数保护
5ms check interval
I/O 和 C 扩展时释放
3.13 无 GIL 实验
threading
Thread 生命周期
ThreadPoolExecutor
Lock / RLock
Semaphore 限流
Event 信号
Condition 条件等待
threading.local 线程隔离
queue.Queue 线程安全队列
asyncio
Event Loop
async/await
Task / Future
gather / TaskGroup
超时控制
Semaphore 并发限制
async for / async with
asyncio.Queue
multiprocessing
Pool.map / starmap
ProcessPoolExecutor
Queue / Pipe 通信
shared_memory 共享内存
spawn vs fork
混合模式
asyncio + ProcessPool
run_in_executor

工具 核心机制 最适合 不适合 关键参数
threading OS 线程 + GIL 交替 遗留同步代码的 I/O 密集 CPU 密集计算 max_workers = min(32, cpu+4)
asyncio 单线程事件循环 高并发 I/O,新项目 遗留同步代码,CPU 密集 Semaphore 控制并发数
multiprocessing 多进程,各自独立 GIL CPU 密集,数据并行 高并发 I/O,进程通信频繁 max_workers = cpu_count()

🎯 一句话总结:GIL 决定了 Python 多线程只是"交替跑"而非"并行跑";协程用"主动让出"代替"被动切换",在高并发 I/O 场景极为高效;真正需要并行计算时,唯有多进程能打破 GIL------而 Python 3.13 的无 GIL 实验,正在悄悄改变这一格局。


参考资料

相关推荐
金融RPA机器人丨实在智能1 小时前
选择Agent平台如何避免“厂商锁定”?深度解析企业级AI智能体架构解耦与落地实践
人工智能·ai·架构
程序大视界1 小时前
【C++ 从基础到项目实战】C++(五):类与对象基础——构造、析构与访问控制
开发语言·c++·cpp
码农小白AI1 小时前
代餐食品营养管控升级,报告数据差错或阻断上市 ——IACheck 助力 AI 报告审核精准锁定成分核算问题
人工智能
代码中介商1 小时前
掌握C++ std::bind:参数绑定与灵活调用
开发语言·c++
searchforAI1 小时前
CC-Switch教程:统一管理Skills、MCP、模型供应商、系统提示词等多项配置
人工智能·gpt·ai·大模型·agent·claudecode
tedcloud1231 小时前
codegraph部署教程:构建代码库语义分析环境
服务器·人工智能·word·excel
城事漫游Molly1 小时前
AI赋能质性研究(四):AI辅助写分析备忘录
人工智能·prompt·ai for science·定性研究
拽着尾巴的鱼儿1 小时前
Java 对象的深拷贝和浅拷贝
java·开发语言
北京软秦科技有限公司1 小时前
AI报告审核加持,IACheck助力企业数智化转型与检测报告质量再造
人工智能