Python 通用基础工具包整理
工欲善其事,必先利其器。这些工具包能让你的 Python 开发效率翻倍!
前言
Python 生态的强大之处在于其丰富的第三方库。本文精选 10+ 类通用基础工具包 ,涵盖 HTTP 请求、数据模型、配置管理、日志、命令行、异步并发等核心场景。每个工具都配有完整代码示例,即学即用!
一、HTTP 请求三剑客
1.1 requests - 简洁的 HTTP 客户端
定位: 最流行的 HTTP 库,调用 REST API 必备
安装:
bash
pip install requests
基础用法:
python
import requests
# GET 请求
response = requests.get('https://api.github.com/users/octocat')
print(response.status_code) # 200
print(response.json()) # 解析 JSON
# 带参数的 GET
params = {'q': 'python', 'sort': 'stars'}
response = requests.get('https://api.github.com/search/repositories', params=params)
# POST 请求
data = {'title': 'foo', 'body': 'bar', 'userId': 1}
response = requests.post('https://jsonplaceholder.typicode.com/posts', json=data)
print(response.json())
# 带 Headers
headers = {'Authorization': 'Bearer YOUR_TOKEN'}
response = requests.get('https://api.example.com/protected', headers=headers)
# 超时设置
try:
response = requests.get('https://api.example.com', timeout=5)
except requests.Timeout:
print("请求超时")
进阶用法:
python
# Session 复用连接
session = requests.Session()
session.auth = ('user', 'pass')
session.headers.update({'User-Agent': 'my-app'})
# 多次请求自动携带认证和 headers
r1 = session.get('https://api.example.com/users')
r2 = session.get('https://api.example.com/posts')
# 文件上传
files = {'file': open('report.pdf', 'rb')}
response = requests.post('https://api.example.com/upload', files=files)
# 下载大文件(流式)
with requests.get('https://example.com/large.zip', stream=True) as r:
r.raise_for_status()
with open('large.zip', 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
1.2 httpx - 支持同步/异步的现代 HTTP 库
定位: requests 的现代替代品,支持异步,性能更优
安装:
bash
pip install httpx
同步用法(类似 requests):
python
import httpx
# GET 请求
with httpx.Client() as client:
response = client.get('https://api.github.com/users/octocat')
print(response.json())
# 自动处理重定向
response = client.get('https://httpbin.org/redirect/3', follow_redirects=True)
异步用法(核心优势):
python
import httpx
import asyncio
async def fetch_data():
async with httpx.AsyncClient() as client:
# 单个请求
response = await client.get('https://api.example.com/data')
return response.json()
async def fetch_concurrent():
async with httpx.AsyncClient() as client:
# 并发请求
urls = [
'https://api.example.com/users',
'https://api.example.com/posts',
'https://api.example.com/comments'
]
tasks = [client.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
for resp in responses:
print(resp.status_code, resp.json())
# 运行
asyncio.run(fetch_data())
asyncio.run(fetch_concurrent())
HTTP/2 支持:
python
# 启用 HTTP/2
with httpx.Client(http2=True) as client:
response = client.get('https://api.example.com')
1.3 urllib3 - 底层 HTTP 库
定位: 很多高层库(包括 requests)的底层依赖,适合需要精细控制的场景
安装:
bash
pip install urllib3
基础用法:
python
import urllib3
# 创建连接池
http = urllib3.PoolManager()
# GET 请求
response = http.request('GET', 'https://api.github.com/users/octocat')
print(response.status)
print(response.data.decode('utf-8'))
# 带参数
response = http.request('GET', 'https://httpbin.org/get', fields={'key': 'value'})
# POST 请求
response = http.request(
'POST',
'https://httpbin.org/post',
fields={'title': 'foo', 'body': 'bar'}
)
连接池配置:
python
import urllib3
http = urllib3.PoolManager(
maxsize=10, # 最大连接数
block=True, # 连接池满时阻塞
retries=3, # 重试次数
timeout=urllib3.Timeout(connect=5, read=10)
)
# 使用
response = http.request('GET', 'https://api.example.com')
高级用法 - 自定义重试策略:
python
from urllib3.util.retry import Retry
retry = Retry(
total=3,
backoff_factor=0.3,
status_forcelist=[500, 502, 503, 504],
allowed_methods=["HEAD", "GET", "OPTIONS", "POST"]
)
http = urllib3.PoolManager(retries=retry)
二、数据模型与校验
2.1 pydantic - 数据模型与校验
定位: 类型安全的数据模型,FastAPI 默认使用
安装:
bash
pip install pydantic
基础模型:
python
from pydantic import BaseModel, EmailStr, Field
from typing import Optional, List
from datetime import datetime
class User(BaseModel):
id: int
name: str
email: EmailStr
age: int = Field(ge=0, le=150, description="用户年龄")
tags: Optional[List[str]] = None
created_at: datetime = Field(default_factory=datetime.now)
# 创建实例(自动校验)
user = User(
id=1,
name="张三",
email="zhangsan@example.com",
age=25,
tags=["admin", "vip"]
)
print(user.name) # 张三
print(user.dict()) # 转为字典
print(user.json()) # 转为 JSON 字符串
校验失败示例:
python
try:
User(id=1, name="李四", email="invalid-email", age=200)
except Exception as e:
print(e)
# 输出详细的校验错误信息
嵌套模型:
python
class Address(BaseModel):
city: str
street: str
zipcode: str
class UserWithAddress(BaseModel):
name: str
address: Address
user = UserWithAddress(
name="王五",
address={"city": "北京", "street": "长安街", "zipcode": "100000"}
)
数据转换:
python
class Product(BaseModel):
price: float
quantity: int
# 自定义转换器
def total(self) -> float:
return self.price * self.quantity
# 从字典创建
data = {"price": "99.9", "quantity": "5"} # 字符串自动转换
product = Product(**data)
print(product.total()) # 499.5
Pydantic V2 新特性:
python
from pydantic import BaseModel, field_validator
class User(BaseModel):
name: str
email: str
@field_validator('email')
@classmethod
def validate_email(cls, v):
if '@' not in v:
raise ValueError('无效邮箱')
return v.lower()
2.2 attrs / dataclasses - 声明式数据类
Python 内置 dataclasses(3.7+)
无需安装,Python 3.7+ 内置
python
from dataclasses import dataclass, field
from typing import List, Optional
from datetime import datetime
@dataclass
class User:
name: str
age: int
email: Optional[str] = None
tags: List[str] = field(default_factory=list)
created_at: datetime = field(default_factory=datetime.now)
# 创建实例
user = User(name="张三", age=25)
print(user) # User(name='张三', age=25, ...)
# 自动生成的方法
print(user == User(name="张三", age=25)) # True
print(user.__dict__) # 转为字典
冻结实例(不可变):
python
@dataclass(frozen=True)
class Config:
host: str
port: int
config = Config(host="localhost", port=8080)
# config.port = 9090 # 报错:FrozenInstanceError
attrs - 更强大的声明式类
安装:
bash
pip install attrs
python
import attr
from typing import List, Optional
@attr.s
class User:
name = attr.ib(type=str)
age = attr.ib(type=int)
email = attr.ib(type=Optional[str], default=None)
tags = attr.ib(type=List[str], factory=list)
# 自定义校验
@age.validator
def _check_age(self, attribute, value):
if value < 0 or value > 150:
raise ValueError(f"年龄必须在 0-150 之间")
# 创建实例
user = User(name="李四", age=30)
# 转为字典/JSON
import attr
print(attr.asdict(user))
对比选择:
| 特性 | dataclasses | attrs |
|---|---|---|
| 安装 | 内置 | 需 pip 安装 |
| 校验 | 手动 | 内置 validator |
| 性能 | 好 | 更好 |
| 功能 | 基础 | 丰富 |
建议: 简单场景用 dataclasses,复杂校验用 attrs 或 pydantic。
三、配置管理
3.1 python-dotenv - 加载 .env 配置
定位: 从 .env 文件加载环境变量,12-Factor 应用必备
安装:
bash
pip install python-dotenv
.env 文件:
bash
# .env
DATABASE_URL=postgresql://user:pass@localhost:5432/mydb
SECRET_KEY=your-secret-key-here
DEBUG=True
API_KEY=sk-xxxxx
REDIS_HOST=localhost
REDIS_PORT=6379
基础用法:
python
from dotenv import load_dotenv
import os
# 加载 .env 文件
load_dotenv() # 默认加载当前目录的.env
# 读取环境变量
database_url = os.getenv('DATABASE_URL')
secret_key = os.getenv('SECRET_KEY')
debug = os.getenv('DEBUG', 'False').lower() == 'true'
print(f"Database: {database_url}")
print(f"Debug: {debug}")
加载指定文件:
python
from dotenv import load_dotenv
from pathlib import Path
env_path = Path('.') / '.env.production'
load_dotenv(dotenv_path=env_path)
自动加载(推荐):
python
# 在应用入口处
from dotenv import load_dotenv
load_dotenv() # 自动查找 .env 文件
# 或者使用 find_dotenv 自动定位
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv()) # 递归查找上级目录
在 FastAPI 中使用:
python
from fastapi import FastAPI
from pydantic import BaseSettings
class Settings(BaseSettings):
database_url: str
secret_key: str
debug: bool = False
class Config:
env_file = ".env"
settings = Settings()
app = FastAPI(debug=settings.debug)
四、日志工具
4.1 loguru - 更易用的日志库
定位: 零配置、功能强大的现代日志库
安装:
bash
pip install loguru
基础用法:
python
from loguru import logger
# 直接使用,无需配置
logger.debug("调试信息")
logger.info("普通信息")
logger.warning("警告信息")
logger.error("错误信息")
logger.critical("严重错误")
# 带格式化
logger.info(f"用户 {user_id} 登录成功")
# 带异常堆栈
try:
1 / 0
except ZeroDivisionError:
logger.exception("除零错误")
文件输出:
python
from loguru import logger
# 添加到文件
logger.add("app.log", rotation="10 MB", retention="7 days")
# 按天切割
logger.add("logs/{time:YYYY-MM-DD}.log", rotation="00:00")
# 带压缩
logger.add("logs/app.log.gz", compression="zip")
自定义格式:
python
logger.add(
"app.log",
format="{time:YYYY-MM-DD HH:mm:ss} | {level} | {name}:{function}:{line} | {message}",
level="INFO",
colorize=True
)
上下文信息:
python
# 添加上下文
logger.add("app.log", context={"user_id": 123, "request_id": "abc"})
# 或使用 bind
logger.bind(user_id=123).info("用户操作")
与标准 logging 集成:
python
import logging
from loguru import logger
class LoguruHandler(logging.Handler):
def emit(self, record):
logger.opt(depth=1, exception=record.exc_info).log(record.levelname, record.getMessage())
logging.getLogger().addHandler(LoguruHandler())
4.2 structlog - 结构化日志
定位: 生成结构化日志(JSON),便于日志系统采集分析
安装:
bash
pip install structlog
基础配置:
python
import structlog
# 配置
structlog.configure(
processors=[
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()
],
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=True,
)
logger = structlog.get_logger()
使用示例:
python
# 结构化日志
logger.info("user_login", user_id=123, ip="192.168.1.1")
# 输出 JSON:
# {"event": "user_login", "user_id": 123, "ip": "192.168.1.1", "level": "info", "timestamp": "..."}
# 链式绑定
logger = logger.bind(user_id=123)
logger = logger.bind(request_id="abc-123")
logger.info("processing_request")
与 FastAPI 集成:
python
from fastapi import FastAPI, Request
import structlog
app = FastAPI()
logger = structlog.get_logger()
@app.middleware("http")
async def log_requests(request: Request, call_next):
logger.info("request_started", method=request.method, path=request.url.path)
response = await call_next(request)
logger.info("request_completed", status=response.status_code)
return response
对比选择:
| 场景 | 推荐 |
|---|---|
| 本地开发、调试 | loguru |
| 生产环境、ELK 采集 | structlog |
| 需要人类可读 | loguru |
| 需要机器解析 | structlog |
五、终端美化与命令行
5.1 rich - 终端美化
定位: 在终端输出富文本、表格、进度条等
安装:
bash
pip install rich
基础输出:
python
from rich import print
from rich.text import Text
# 直接打印(自动解析 markdown)
print("[bold red]错误[/bold red]: 文件不存在")
print("[green]✓[/green] 操作成功")
# 打印复杂对象
data = {"name": "张三", "age": 25, "skills": ["Python", "Go"]}
print(data) # 自动美化 JSON
表格:
python
from rich.table import Table
from rich.console import Console
console = Console()
table = Table(title="用户列表")
table.add_column("ID", style="cyan")
table.add_column("姓名", style="magenta")
table.add_column("邮箱", style="green")
table.add_row("1", "张三", "zhang@example.com")
table.add_row("2", "李四", "li@example.com")
table.add_row("3", "王五", "wang@example.com")
console.print(table)
进度条:
python
from rich.progress import Progress
import time
with Progress() as progress:
task = progress.add_task("处理中...", total=100)
for i in range(100):
time.sleep(0.05)
progress.update(task, advance=1)
多进度条:
python
from rich.progress import Progress, SpinnerColumn, BarColumn, TextColumn
with Progress(
SpinnerColumn(),
TextColumn("[progress.description]{task.description}"),
BarColumn(),
TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
) as progress:
task1 = progress.add_task("下载...", total=100)
task2 = progress.add_task("解压...", total=50)
for i in range(100):
progress.update(task1, advance=1)
time.sleep(0.02)
for i in range(50):
progress.update(task2, advance=1)
time.sleep(0.02)
面板和布局:
python
from rich.panel import Panel
from rich.layout import Layout
layout = Layout()
layout.split(
Layout(name="header", size=3),
Layout(name="body"),
Layout(name="footer", size=3)
)
layout["header"].update(Panel("应用标题", style="bold blue"))
layout["body"].update(Panel("主要内容"))
layout["footer"].update(Panel("状态栏"))
console.print(layout)
5.2 typer - 命令行应用快速开发
定位: 基于 type hints 的现代 CLI 框架,rich 的兄弟项目
安装:
bash
pip install typer
基础 CLI:
python
import typer
app = typer.Typer()
@app.command()
def hello(name: str = typer.Argument(..., help="用户名")):
"""打招呼"""
typer.echo(f"Hello {name}!")
@app.command()
def greet(name: str, age: int = 18, verbose: bool = False):
"""带选项的问候"""
if verbose:
typer.echo(f"姓名:{name}, 年龄:{age}")
else:
typer.echo(f"Hi {name}!")
if __name__ == "__main__":
app()
运行:
bash
python cli.py hello 张三
python cli.py greet 李四 --age 25 --verbose
python cli.py --help
子命令组:
python
import typer
app = typer.Typer()
users_app = typer.Typer()
app.add_typer(users_app, name="users")
@users_app.command()
def create(name: str):
"""创建用户"""
typer.echo(f"创建用户:{name}")
@users_app.command()
def delete(user_id: int):
"""删除用户"""
typer.echo(f"删除用户 ID: {user_id}")
@users_app.command()
def list():
"""列出用户"""
typer.echo("用户列表...")
if __name__ == "__main__":
app()
运行:
bash
python cli.py users create 张三
python cli.py users list
带 rich 输出:
python
from rich.table import Table
from rich.console import Console
@app.command()
def list_users():
"""列出用户(美化输出)"""
console = Console()
table = Table(title="用户列表")
table.add_column("ID")
table.add_column("姓名")
table.add_column("状态")
table.add_row("1", "张三", "✓ 活跃")
table.add_row("2", "李四", "✗ 禁用")
console.print(table)
自动补全:
bash
# 生成自动补全脚本
typer cli.py utils completion --install
5.3 click - 成熟的命令行框架
定位: Flask 团队开发,功能完善的 CLI 框架
安装:
bash
pip install click
基础用法:
python
import click
@click.command()
@click.argument('name')
@click.option('--count', default=1, help='重复次数')
@click.option('--verbose', is_flag=True, help='详细输出')
def hello(name, count, verbose):
"""打招呼命令"""
for i in range(count):
if verbose:
click.echo(f"[{i+1}] Hello {name}!")
else:
click.echo(f"Hello {name}!")
if __name__ == '__main__':
hello()
命令组:
python
import click
@click.group()
def cli():
"""命令行工具"""
pass
@cli.command()
@click.option('--name', default='World', help='用户名')
def greet(name):
"""打招呼"""
click.echo(f"Hello {name}!")
@cli.command()
@click.argument('file')
def upload(file):
"""上传文件"""
click.echo(f"上传文件:{file}")
if __name__ == '__main__':
cli()
交互式输入:
python
@click.command()
def interactive():
"""交互式命令"""
name = click.prompt('请输入姓名')
age = click.prompt('请输入年龄', type=int)
confirm = click.confirm('确认提交吗?')
if confirm:
click.echo(f"提交:{name}, {age}岁")
5.4 argparse - Python 内置参数解析
定位: Python 标准库,无需安装,适合简单场景
python
import argparse
parser = argparse.ArgumentParser(description='文件处理工具')
# 位置参数
parser.add_argument('input', help='输入文件')
# 可选参数
parser.add_argument('-o', '--output', help='输出文件')
parser.add_argument('-v', '--verbose', action='store_true', help='详细输出')
parser.add_argument('-n', '--count', type=int, default=1, help='处理次数')
parser.add_argument('--format', choices=['json', 'csv', 'xml'], default='json')
args = parser.parse_args()
print(f"输入:{args.input}")
print(f"输出:{args.output}")
print(f"详细:{args.verbose}")
运行:
bash
python script.py input.txt -o output.json -v -n 5 --format csv
六、异步与并发
Python 的并发编程有多种选择,从标准库的 asyncio 到第三方的高性能库。本节详解异步与并发核心工具。
6.1 asyncio - 标准库异步框架
定位: Python 3.4+ 内置的异步 IO 框架,异步编程的基石
安装: 标准库,无需安装
基础概念:
python
import asyncio
# 定义协程
async def say_hello():
print("Hello")
await asyncio.sleep(1) # 非阻塞等待
print("World")
# 运行协程
asyncio.run(say_hello())
并发执行多个任务:
python
import asyncio
import time
async def fetch_data(task_id, delay):
print(f"任务 {task_id} 开始")
await asyncio.sleep(delay) # 模拟 IO 操作
print(f"任务 {task_id} 完成")
return f"结果 {task_id}"
async def main():
# 方式 1:gather - 等待所有任务完成
results = await asyncio.gather(
fetch_data(1, 2),
fetch_data(2, 1),
fetch_data(3, 3)
)
print(f"所有结果:{results}")
# 方式 2:create_task - 后台执行
task1 = asyncio.create_task(fetch_data(4, 1))
task2 = asyncio.create_task(fetch_data(5, 2))
# 做其他事...
await asyncio.sleep(0.5)
# 等待结果
r1 = await task1
r2 = await task2
print(f"结果:{r1}, {r2}")
asyncio.run(main())
超时处理:
python
async def long_task():
await asyncio.sleep(10)
return "完成"
async def main():
try:
# 设置 5 秒超时
result = await asyncio.wait_for(long_task(), timeout=5)
print(result)
except asyncio.TimeoutError:
print("任务超时")
asyncio.run(main())
任务组(Python 3.11+):
python
async def main():
async with asyncio.TaskGroup() as tg:
task1 = tg.create_task(fetch_data(1, 2))
task2 = tg.create_task(fetch_data(2, 1))
# 自动等待所有任务完成
# 任一任务失败会取消其他任务
asyncio.run(main())
生产者 - 消费者模式:
python
async def producer(queue, n_items):
for i in range(n_items):
await queue.put(f"item-{i}")
print(f"生产:item-{i}")
await queue.put(None) # 结束标记
async def consumer(queue):
while True:
item = await queue.get()
if item is None:
break
print(f"消费:{item}")
await asyncio.sleep(0.5)
queue.task_done()
async def main():
queue = asyncio.Queue()
# 启动生产者和消费者
await asyncio.gather(
producer(queue, 5),
consumer(queue)
)
asyncio.run(main())
6.2 aiohttp / httpx - 异步 HTTP
aiohttp
定位: 老牌的异步 HTTP 客户端/服务器
安装:
bash
pip install aiohttp
客户端用法:
python
import aiohttp
import asyncio
async def fetch(session, url):
async with session.get(url) as response:
return await response.json()
async def main():
async with aiohttp.ClientSession() as session:
# 单个请求
data = await fetch(session, 'https://api.github.com/users/octocat')
print(data)
# 并发请求
urls = [
'https://api.github.com/users/octocat',
'https://api.github.com/repos/octocat/Hello-World',
'https://api.github.com/orgs/github'
]
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
for result in results:
print(result)
asyncio.run(main())
流式响应:
python
async def download_file(session, url, filename):
async with session.get(url) as response:
with open(filename, 'wb') as f:
# 分块读取,节省内存
async for chunk in response.content.iter_chunked(8192):
f.write(chunk)
连接池配置:
python
import aiohttp
async def main():
# 配置连接池
connector = aiohttp.TCPConnector(
limit=100, # 总连接数
limit_per_host=10, # 每主机连接数
ttl_dns_cache=300, # DNS 缓存时间
use_dns_cache=True,
)
async with aiohttp.ClientSession(connector=connector) as session:
# 使用 session 发起请求
pass
asyncio.run(main())
httpx(异步部分)
定位: 现代 HTTP 库,同时支持同步和异步
安装:
bash
pip install httpx
异步用法(已在 HTTP 章节介绍基础,这里补充进阶):
python
import httpx
import asyncio
async def main():
async with httpx.AsyncClient() as client:
# 带重试的请求
response = await client.get(
'https://api.example.com/data',
timeout=30.0
)
# 流式上传
async def stream_data():
for i in range(10):
yield f"chunk {i}\n".encode()
response = await client.post(
'https://api.example.com/upload',
content=stream_data()
)
asyncio.run(main())
对比选择:
| 特性 | aiohttp | httpx |
|---|---|---|
| 成熟度 | 老牌稳定 | 较新但活跃 |
| API 设计 | 较复杂 | 简洁统一 |
| 同步支持 | 需额外封装 | 原生支持 |
| HTTP/2 | 实验性 | 完善支持 |
| 推荐场景 | 复杂异步应用 | 新项目、需要同步/异步切换 |
6.3 trio / curio - 替代式异步运行时
trio
定位: 更安全的异步框架,"结构化并发"理念
安装:
bash
pip install trio
基础用法:
python
import trio
async def worker(id, n):
for i in range(n):
print(f"Worker {id}: task {i}")
await trio.sleep(0.1)
async def main():
# 结构化并发 - 自动等待子任务
async with trio.open_nursery() as nursery:
nursery.start_soon(worker, 1, 5)
nursery.start_soon(worker, 2, 5)
nursery.start_soon(worker, 3, 5)
# 所有子任务完成后才会退出
trio.run(main)
超时和取消:
python
import trio
async def long_task():
await trio.sleep(10)
return "完成"
async def main():
try:
# 超时自动取消
with trio.fail_after(5): # 5 秒超时
result = await long_task()
print(result)
except trio.TooSlowError:
print("任务超时,已自动取消")
trio.run(main)
通道(Channel)通信:
python
import trio
async def producer(send_channel, n):
for i in range(n):
await send_channel.send(f"item-{i}")
print(f"生产:item-{i}")
await send_channel.aclose()
async def consumer(receive_channel):
async for item in receive_channel:
print(f"消费:{item}")
await trio.sleep(0.1)
async def main():
send_channel, receive_channel = trio.open_memory_channel(0)
async with trio.open_nursery() as nursery:
nursery.start_soon(producer, send_channel, 5)
nursery.start_soon(consumer, receive_channel)
trio.run(main)
trio 的优势:
- 结构化并发:子任务生命周期自动管理
- 取消传播:父任务取消时自动取消所有子任务
- 无隐藏状态:没有全局事件循环
- 更安全的 API:减少常见错误
curio
定位: 简洁的异步库,强调简单性
安装:
bash
pip install curio
python
import curio
async def worker(name, n):
for i in range(n):
print(f"{name}: {i}")
await curio.sleep(0.1)
async def main():
async with curio.TaskGroup() as g:
await g.spawn(worker, "A", 5)
await g.spawn(worker, "B", 5)
curio.run(main)
选型建议:
| 框架 | 推荐场景 |
|---|---|
| asyncio | 标准库,通用选择 |
| trio | 需要严格的结构化并发 |
| curio | 追求简洁的小型项目 |
6.4 concurrent.futures - 线程/进程池
定位: 标准库,简化线程池/进程池使用
安装: 标准库,无需安装
线程池(IO 密集型):
python
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
import time
def fetch_url(url):
response = requests.get(url)
return f"{url}: {response.status_code}"
urls = [
'https://www.google.com',
'https://www.github.com',
'https://www.python.org',
]
# 方式 1:submit + as_completed
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(fetch_url, url) for url in urls]
for future in as_completed(futures):
print(future.result())
# 方式 2:map(保持顺序)
with ThreadPoolExecutor(max_workers=3) as executor:
results = executor.map(fetch_url, urls)
for result in results:
print(result)
进程池(CPU 密集型):
python
from concurrent.futures import ProcessPoolExecutor
import math
def is_prime(n):
if n < 2:
return False
for i in range(2, int(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True
numbers = [10**15 + i for i in range(100)]
# 多进程并行计算
with ProcessPoolExecutor(max_workers=4) as executor:
results = executor.map(is_prime, numbers)
primes = [n for n, is_p in zip(numbers, results) if is_p]
print(f"找到 {len(primes)} 个质数")
超时处理:
python
from concurrent.futures import ThreadPoolExecutor, TimeoutError
import time
def slow_task():
time.sleep(10)
return "完成"
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(slow_task)
try:
result = future.result(timeout=5) # 5 秒超时
print(result)
except TimeoutError:
print("任务超时")
回调函数:
python
from concurrent.futures import ThreadPoolExecutor
def task(n):
return n * n
def callback(future):
print(f"任务完成,结果:{future.result()}")
with ThreadPoolExecutor() as executor:
future = executor.submit(task, 5)
future.add_done_callback(callback)
6.5 multiprocessing - 多进程
定位: 标准库,真正的并行计算(绕过 GIL)
安装: 标准库,无需安装
基础用法:
python
from multiprocessing import Process, cpu_count
import os
def worker(name):
print(f"进程 {name} 启动,PID: {os.getpid()}")
return f"Hello from {name}"
if __name__ == '__main__':
processes = []
for i in range(cpu_count()):
p = Process(target=worker, args=(f"worker-{i}",))
p.start()
processes.append(p)
for p in processes:
p.join()
进程间通信 - Queue:
python
from multiprocessing import Process, Queue
def producer(queue):
for i in range(5):
queue.put(f"item-{i}")
print(f"生产:item-{i}")
queue.put(None) # 结束标记
def consumer(queue):
while True:
item = queue.get()
if item is None:
break
print(f"消费:{item}")
if __name__ == '__main__':
queue = Queue()
p1 = Process(target=producer, args=(queue,))
p2 = Process(target=consumer, args=(queue,))
p1.start()
p2.start()
p1.join()
p2.join()
进程间通信 - Pipe:
python
from multiprocessing import Process, Pipe
def worker(conn):
conn.send("Hello from worker")
msg = conn.recv()
print(f"Worker 收到:{msg}")
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=worker, args=(child_conn,))
p.start()
print(f"Main 收到:{parent_conn.recv()}")
parent_conn.send("Hello from main")
p.join()
共享内存:
python
from multiprocessing import Process, Value, Array, Manager
def increment(counter, arr, idx):
with counter.get_lock():
counter.value += 1
arr[idx] = idx * idx
if __name__ == '__main__':
counter = Value('i', 0) # 共享整数
arr = Array('i', 10) # 共享数组
processes = []
for i in range(5):
p = Process(target=increment, args=(counter, arr, i))
p.start()
processes.append(p)
for p in processes:
p.join()
print(f"计数器:{counter.value}")
print(f"数组:{list(arr)}")
进程池:
python
from multiprocessing import Pool
import os
def square(x):
return x * x
if __name__ == '__main__':
with Pool(processes=4) as pool:
# map
results = pool.map(square, range(10))
print(results)
# imap(惰性求值)
for result in pool.imap(square, range(10)):
print(result)
# apply_async(异步)
async_result = pool.apply_async(square, (5,))
print(async_result.get())
Manager - 跨进程共享复杂对象:
python
from multiprocessing import Process, Manager
def worker(shared_dict, shared_list, idx):
shared_dict[idx] = f"value-{idx}"
shared_list.append(idx)
if __name__ == '__main__':
with Manager() as manager:
shared_dict = manager.dict()
shared_list = manager.list()
processes = []
for i in range(5):
p = Process(target=worker, args=(shared_dict, shared_list, i))
p.start()
processes.append(p)
for p in processes:
p.join()
print(f"字典:{dict(shared_dict)}")
print(f"列表:{list(shared_list)}")
6.6 uvloop - 更快的事件循环
定位: 基于 libuv 的 asyncio 事件循环替代,性能提升 2-4 倍
安装:
bash
pip install uvloop
基础用法:
python
import asyncio
import uvloop
# 安装 uvloop 作为默认事件循环
uvloop.install()
async def main():
print("使用 uvloop 运行")
await asyncio.sleep(1)
asyncio.run(main())
性能对比示例:
python
import asyncio
import time
import uvloop
async def worker(n):
await asyncio.sleep(0.001)
return n
async def benchmark():
start = time.time()
tasks = [worker(i) for i in range(10000)]
await asyncio.gather(*tasks)
return time.time() - start
# 使用默认事件循环
print(f"默认循环:{asyncio.run(benchmark()):.3f}s")
# 使用 uvloop
uvloop.install()
print(f"uvloop: {asyncio.run(benchmark()):.3f}s")
在 FastAPI 中使用:
python
import uvicorn
import uvloop
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
return {"message": "Hello World"}
if __name__ == "__main__":
# uvicorn 默认使用 uvloop(如果已安装)
uvicorn.run(app, host="0.0.0.0", port=8000)
适用场景:
| 场景 | 推荐 |
|---|---|
| 高并发 IO | ✅ 强烈推荐 |
| CPU 密集型 | ❌ 无帮助,用多进程 |
| 简单脚本 | ⚠️ 收益不明显 |
| 生产环境 | ✅ 推荐 |
6.7 并发模型对比与选型
并发模型总览
Python 并发编程
│
┌────────────────┼────────────────┐
│ │ │
异步 IO 多线程 多进程
(asyncio) (threading) (multiprocessing)
│ │ │
▼ ▼ ▼
单线程多协程 多线程并行 多进程并行
适合 IO 密集 适合 IO 密集 适合 CPU 密集
高并发、低开销 有 GIL 限制 无 GIL 限制
选型决策树
需要并行吗?
├── 否(IO 等待为主)→ asyncio / aiohttp / httpx
│ └── 追求极致性能 → + uvloop
├── 是(CPU 计算为主)
│ ├── 需要共享内存 → multiprocessing + Manager
│ └── 无需共享 → concurrent.futures.ProcessPoolExecutor
└── 不确定 → 先测性能,再决定
实战对比
python
import asyncio
import threading
import multiprocessing
import time
import requests
# 场景 1:IO 密集型(网络请求)
def sync_fetch(url):
return requests.get(url).text
async def async_fetch(session, url):
async with session.get(url) as resp:
return await resp.text()
# 多线程
def thread_fetch(urls):
with ThreadPoolExecutor(max_workers=10) as executor:
return list(executor.map(sync_fetch, urls))
# 异步
async def async_fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [async_fetch(session, url) for url in urls]
return await asyncio.gather(*tasks)
# 场景 2:CPU 密集型(计算)
def cpu_intensive(n):
return sum(i * i for i in range(n))
# 多进程
def process_compute(numbers):
with ProcessPoolExecutor() as executor:
return list(executor.map(cpu_intensive, numbers))
性能参考
| 场景 | 线程池 | 进程池 | asyncio | 推荐 |
|---|---|---|---|---|
| 网络请求(100 并发) | 2.5s | 5.0s | 0.8s | asyncio |
| 文件 IO(100 文件) | 1.2s | 3.0s | 0.5s | asyncio |
| CPU 计算(10 任务) | 10s | 2.5s | 10s | 进程池 |
| 混合负载 | 5s | 4s | 3s | asyncio+ 进程池 |
七、工具对比与选型建议
HTTP 库选型
| 场景 | 推荐 |
|---|---|
| 简单同步请求 | requests |
| 异步/高性能 | httpx |
| 底层控制 | urllib3 |
数据模型选型
| 场景 | 推荐 |
|---|---|
| API 校验、FastAPI | pydantic |
| 简单数据容器 | dataclasses |
| 复杂校验 | attrs |
日志选型
| 场景 | 推荐 |
|---|---|
| 本地开发 | loguru |
| 生产/ELK | structlog |
CLI 框架选型
| 场景 | 推荐 |
|---|---|
| 快速开发、现代 CLI | typer |
| 成熟稳定、复杂 CLI | click |
| 简单脚本 | argparse |
八、综合实战示例
完整的 CLI 工具
python
#!/usr/bin/env python3
"""
用户管理 CLI 工具
演示多个工具包的综合使用
"""
import typer
import httpx
import asyncio
from pydantic import BaseModel, EmailStr
from loguru import logger
from rich.table import Table
from rich.console import Console
from dotenv import load_dotenv
import os
# 加载配置
load_dotenv()
app = typer.Typer()
console = Console()
logger.add("app.log", rotation="1 MB")
class User(BaseModel):
id: int
name: str
email: EmailStr
@app.command()
def list_users():
"""列出所有用户"""
logger.info("获取用户列表")
# 模拟 API 调用
users = [
User(id=1, name="张三", email="zhang@example.com"),
User(id=2, name="李四", email="li@example.com"),
User(id=3, name="王五", email="wang@example.com"),
]
# 美化输出
table = Table(title="用户列表")
table.add_column("ID", style="cyan")
table.add_column("姓名", style="magenta")
table.add_column("邮箱", style="green")
for user in users:
table.add_row(str(user.id), user.name, user.email)
console.print(table)
logger.info(f"列出 {len(users)} 个用户")
@app.command()
async def fetch_user(user_id: int):
"""获取单个用户"""
api_key = os.getenv('API_KEY')
async with httpx.AsyncClient() as client:
headers = {'Authorization': f'Bearer {api_key}'}
response = await client.get(
f'https://api.example.com/users/{user_id}',
headers=headers
)
if response.status_code == 200:
data = response.json()
user = User(**data)
console.print(f"[green]✓[/green] 用户:{user.name}")
else:
console.print(f"[red]✗[/red] 获取失败:{response.status_code}")
@app.command()
def create_user(name: str, email: EmailStr):
"""创建新用户"""
user = User(id=0, name=name, email=email)
logger.info(f"创建用户:{user}")
console.print(f"[green]✓[/green] 用户 {name} 创建成功")
if __name__ == "__main__":
app()
运行示例:
bash
# 列出用户
python cli.py list-users
# 获取用户
python cli.py fetch-user 123
# 创建用户
python cli.py create-user 张三 zhang@example.com
# 查看帮助
python cli.py --help
总结
基础工具包
| 工具包 | 核心用途 | 推荐指数 |
|---|---|---|
| requests | HTTP 请求 | ⭐⭐⭐⭐⭐ |
| httpx | 异步 HTTP | ⭐⭐⭐⭐⭐ |
| pydantic | 数据校验 | ⭐⭐⭐⭐⭐ |
| loguru | 日志 | ⭐⭐⭐⭐⭐ |
| rich | 终端美化 | ⭐⭐⭐⭐⭐ |
| typer | CLI 开发 | ⭐⭐⭐⭐⭐ |
| python-dotenv | 配置管理 | ⭐⭐⭐⭐ |
| dataclasses | 数据类 | ⭐⭐⭐⭐ |
| structlog | 结构化日志 | ⭐⭐⭐⭐ |
异步与并发
| 工具包 | 核心用途 | 推荐指数 |
|---|---|---|
| asyncio | 异步框架(标准库) | ⭐⭐⭐⭐⭐ |
| aiohttp | 异步 HTTP 客户端 | ⭐⭐⭐⭐ |
| concurrent.futures | 线程/进程池(标准库) | ⭐⭐⭐⭐⭐ |
| multiprocessing | 多进程并行(标准库) | ⭐⭐⭐⭐⭐ |
| uvloop | 高性能事件循环 | ⭐⭐⭐⭐ |
| trio | 结构化并发 | ⭐⭐⭐ |
最后的建议:
- 不要重复造轮子 - 这些工具都经过大量验证
- 保持一致性 - 团队内统一工具选型
- 关注版本 - 关注工具包的版本更新,尤其是大版本的变化
- 善用文档 - 官方文档永远是最好的老师
- 异步谨慎使用 - 异步不是银弹,IO 密集才用,CPU 密集用多进程
- 性能先测量 - 用 profile 工具找到瓶颈,再决定优化方向