Python FastAPI性能优化实战：8个让你的API快3倍的技巧

FastAPI很火，但很多项目上线后才发现------慢。

不是FastAPI本身慢，而是用的人没注意那些隐形的性能杀手。数据库连接没池化、同步操作阻塞事件循环、N+1查询满天飞......

这篇文章不讲理论，只讲我在线上项目里踩过的坑和对应的解决方案，每个都有压测数据对比。

一、数据库连接池：第一个就该优化的

1.1 问题

最常见的新手错误：每个请求都创建新的数据库连接。

python 复制代码

# ❌ 每个请求创建连接
@app.get("/users/{user_id}")
async def get_user(user_id: int):
    # 每次请求都新建连接，TCP三次握手+SSL+认证
    engine = create_engine("postgresql://user:pass@localhost/db")
    with Session(engine) as session:
        user = session.query(User).get(user_id)
    engine.dispose()  # 用完就扔
    return user

压测结果（1000并发，获取单个用户）：

makefile 复制代码

平均响应时间: 340ms
P99: 1200ms
错误率: 3.2%（连接超时）

1.2 解决方案：全局连接池

python 复制代码

from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker

# ✅ 应用启动时创建连接池，所有请求共享
engine = create_async_engine(
    "postgresql+asyncpg://user:pass@localhost/db",
    pool_size=20,          # 常驻连接数
    max_overflow=10,       # 高峰期额外连接
    pool_timeout=30,       # 等待连接超时
    pool_recycle=1800,     # 连接回收时间（秒）
    pool_pre_ping=True,    # 使用前检测连接是否存活
)

AsyncSessionLocal = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)

# 依赖注入
async def get_db():
    async with AsyncSessionLocal() as session:
        try:
            yield session
        finally:
            await session.close()

python 复制代码

@app.get("/users/{user_id}")
async def get_user(user_id: int, db: AsyncSession = Depends(get_db)):
    result = await db.execute(select(User).where(User.id == user_id))
    return result.scalar_one_or_none()

压测对比：

makefile 复制代码

平均响应时间: 12ms（↓96%）
P99: 45ms（↓96%）
错误率: 0%

一个改动，响应时间降了28倍。

二、异步I/O：别在async函数里写同步代码

2.1 问题

FastAPI是异步框架，但很多人在async函数里调用同步阻塞操作：

python 复制代码

# ❌ 在async函数中执行同步I/O
@app.get("/process")
async def process_data():
    # 同步文件读取，阻塞整个事件循环
    with open("large_file.json", "r") as f:
        data = json.load(f)
    
    # 同步HTTP请求，阻塞事件循环
    resp = requests.get("https://api.example.com/data")
    
    # 同步数据库查询，阻塞事件循环
    result = sync_db.query(Model).all()
    
    return {"status": "done"}

问题：一个请求的同步操作阻塞了整个事件循环，所有其他请求都排队等待。

2.2 解决方案

方案A：用async版本的库

python 复制代码

# ✅ 使用异步库
import aiofiles
import httpx

@app.get("/process")
async def process_data():
    # 异步文件读取
    async with aiofiles.open("large_file.json", "r") as f:
        data = json.loads(await f.read())
    
    # 异步HTTP请求
    async with httpx.AsyncClient() as client:
        resp = await client.get("https://api.example.com/data")
    
    return {"status": "done"}

方案B：无法异步的操作丢到线程池

python 复制代码

import asyncio
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(max_workers=4)

@app.get("/process")
async def process_data():
    loop = asyncio.get_event_loop()
    
    # 把同步操作丢到线程池，不阻塞事件循环
    data = await loop.run_in_executor(
        executor, 
        lambda: json.load(open("large_file.json"))
    )
    
    return {"status": "done", "records": len(data)}

2.3 压测对比

10个并发请求，每个请求内做一次文件读取(100ms)：

makefile 复制代码

同步I/O:  总耗时 ~1000ms（串行等待）
异步I/O:  总耗时 ~110ms（并行处理）
线程池:    总耗时 ~300ms（4线程并行+上下文切换）

三、N+1查询：数据库性能的头号杀手

3.1 问题

python 复制代码

# ❌ N+1查询：先查列表，再逐条查关联
@app.get("/articles")
async def list_articles(db: AsyncSession = Depends(get_db)):
    result = await db.execute(select(Article))
    articles = result.scalars().all()
    
    # 每篇文章都要额外查一次作者信息
    for article in articles:
        # 20篇文章 = 1 + 20 = 21条SQL
        await db.refresh(article, ["author"])
    
    return [
        {"title": a.title, "author": a.author.name}
        for a in articles
    ]

3.2 解决方案：joinedload / selectinload

python 复制代码

from sqlalchemy.orm import selectinload, joinedload

# ✅ 方案1：JOIN查询（适合一对一、数据量小的关联）
@app.get("/articles")
async def list_articles(db: AsyncSession = Depends(get_db)):
    result = await db.execute(
        select(Article).options(joinedload(Article.author))
    )
    articles = result.unique().scalars().all()
    
    return [
        {"title": a.title, "author": a.author.name}
        for a in articles
    ]
    # 只执行1条SQL，JOIN一次搞定

# ✅ 方案2：IN查询（适合一对多、数据量大的关联）
@app.get("/articles")
async def list_articles_with_comments(db: AsyncSession = Depends(get_db)):
    result = await db.execute(
        select(Article).options(
            selectinload(Article.author),
            selectinload(Article.comments)
        )
    )
    articles = result.unique().scalars().all()
    
    return [
        {
            "title": a.title,
            "author": a.author.name,
            "comment_count": len(a.comments)
        }
        for a in articles
    ]
    # 3条SQL（文章+作者IN+评论IN），而非1+N+N

3.3 性能对比

20篇文章，每篇5条评论：

makefile 复制代码

N+1查询:     21条SQL，平均 180ms
joinedload:   1条SQL，平均 12ms（↓93%）
selectinload: 3条SQL，平均 18ms（↓90%）

四、响应序列化：别用dict手动拼

4.1 问题

python 复制代码

# ❌ 手动拼dict，慢且容易漏字段
@app.get("/users")
async def list_users(db: AsyncSession = Depends(get_db)):
    result = await db.execute(select(User))
    users = result.scalars().all()
    
    return [
        {
            "id": u.id,
            "name": u.name,
            "email": u.email,
            "created_at": u.created_at.isoformat(),  # 容易忘
            # 20个字段就要写20行...
        }
        for u in users
    ]

4.2 解决方案：Pydantic response_model

python 复制代码

from pydantic import BaseModel
from datetime import datetime

class UserResponse(BaseModel):
    id: int
    name: str
    email: str
    created_at: datetime
    
    class Config:
        from_attributes = True  # Pydantic v2

@app.get("/users", response_model=list[UserResponse])
async def list_users(db: AsyncSession = Depends(get_db)):
    result = await db.execute(select(User))
    return result.scalars().all()
    # Pydantic自动序列化，FastAPI自动缓存schema

好处：

序列化更快：Pydantic v2用Rust核心，比手拼dict快2-3倍
自动文档：Swagger UI自动生成响应结构
字段过滤：不会意外暴露password等敏感字段

五、缓存：热数据别反复查库

5.1 简单内存缓存

python 复制代码

from functools import lru_cache
from datetime import datetime

# ✅ 配置项等很少变化的数据，内存缓存即可
@lru_cache(maxsize=128)
def get_system_config(key: str) -> str:
    """系统配置缓存，进程内有效"""
    return load_from_db(key)

# 需要手动失效
def update_system_config(key: str, value: str):
    save_to_db(key, value)
    get_system_config.cache_clear()

5.2 Redis缓存

python 复制代码

import redis.asyncio as redis
import json
from typing import Optional

redis_client = redis.Redis(host="localhost", port=6379, decode_responses=True)

async def get_cached(key: str, ttl: int = 300) -> Optional[dict]:
    """从Redis获取缓存"""
    data = await redis_client.get(key)
    return json.loads(data) if data else None

async def set_cached(key: str, value: dict, ttl: int = 300):
    """写入Redis缓存"""
    await redis_client.setex(key, ttl, json.dumps(value, ensure_ascii=False))

# 使用
@app.get("/hot-articles")
async def hot_articles(db: AsyncSession = Depends(get_db)):
    cache_key = "hot_articles:page1"
    
    # 先查缓存
    cached = await get_cached(cache_key)
    if cached:
        return cached
    
    # 缓存没有，查库
    result = await db.execute(
        select(Article)
        .where(Article.is_published == True)
        .order_by(Article.view_count.desc())
        .limit(20)
    )
    articles = [{"title": a.title, "views": a.view_count} for a in result.scalars().all()]
    
    # 写入缓存，5分钟过期
    await set_cached(cache_key, articles, ttl=300)
    
    return articles

5.3 压测对比

热门文章接口（1000并发）：

makefile 复制代码

无缓存:    平均 85ms，数据库CPU 80%
Redis缓存: 平均 3ms，数据库CPU 5%（↓96%）

六、分页优化：别用OFFSET

6.1 问题

python 复制代码

# ❌ OFFSET分页：翻到第100页需要扫描前9900条
@app.get("/articles")
async def list_articles(page: int = 1, size: int = 20):
    offset = (page - 1) * size
    result = await db.execute(
        select(Article).offset(offset).limit(size)
    )
    return result.scalars().all()

问题：OFFSET 99000 LIMIT 20 数据库要先扫描99000条再取20条，越往后越慢。

python 复制代码

# ✅ 基于ID游标的分页：无论第几页，只扫描20条
@app.get("/articles")
async def list_articles(
    cursor: Optional[int] = None,  # 上一页最后一条的ID
    size: int = 20,
    db: AsyncSession = Depends(get_db)
):
    query = select(Article).order_by(Article.id.desc()).limit(size + 1)
    
    if cursor:
        query = query.where(Article.id < cursor)
    
    result = await db.execute(query)
    articles = result.scalars().all()
    
    has_more = len(articles) > size
    if has_more:
        articles = articles[:size]
    
    return {
        "data": articles,
        "next_cursor": articles[-1].id if articles and has_more else None,
        "has_more": has_more
    }

6.3 性能对比

100万条数据，每页20条：

makefile 复制代码

OFFSET第5000页: 320ms（扫描100000行）
游标第5000页:    2ms（只扫描20行）

七、中间件优化：减少请求处理管道

7.1 问题

加了太多中间件，每个请求都要过一遍：

python 复制代码

# ❌ 每个请求都过所有中间件，即使不需要
app.add_middleware(LoggingMiddleware)       # 日志
app.add_middleware(CORSMiddleware, ...)      # CORS
app.add_middleware(RateLimitMiddleware)      # 限流
app.add_middleware(AuthMiddleware)           # 认证
app.add_middleware(TracingMiddleware)        # 链路追踪

每个中间件增加0.5-2ms，5个就是2.5-10ms的额外延迟。

7.2 解决方案：按路由选择中间件

python 复制代码

from fastapi import FastAPI, Request
from starlette.routing import Match

# 只对需要认证的路由做认证检查
@app.middleware("http")
async def smart_middleware(request: Request, call_next):
    path = request.url.path
    
    # 健康检查和静态资源跳过所有中间件
    if path in ("/health", "/metrics") or path.startswith("/static/"):
        return await call_next(request)
    
    # 只对/api/开头的路由做认证
    if path.startswith("/api/"):
        token = request.headers.get("Authorization")
        if not token or not verify_token(token):
            return JSONResponse(status_code=401, content={"detail": "Unauthorized"})
    
    # 只对写操作做限流
    if request.method in ("POST", "PUT", "DELETE"):
        if await is_rate_limited(request):
            return JSONResponse(status_code=429, content={"detail": "Too many requests"})
    
    return await call_next(request)

八、应用启动优化：别在启动时做重活

8.1 问题

python 复制代码

# ❌ 启动时加载大量数据到内存
@app.on_event("startup")
async def startup():
    global ALL_CONFIGS, ALL_CATEGORIES, ALL_PERMISSIONS
    ALL_CONFIGS = await load_all_configs()         # 2秒
    ALL_CATEGORIES = await load_all_categories()     # 1秒
    ALL_PERMISSIONS = await load_all_permissions()   # 3秒
    # 启动耗时6秒，部署滚动更新时服务中断

8.2 解决方案：延迟加载

python 复制代码

from functools import lru_cache

class LazyLoader:
    """延迟加载器，首次访问时才初始化"""
    
    def __init__(self):
        self._configs = None
        self._categories = None
        self._permissions = None
        self._loaded = False
    
    async def ensure_loaded(self):
        if self._loaded:
            return
        
        # 并行加载
        import asyncio
        self._configs, self._categories, self._permissions = await asyncio.gather(
            load_all_configs(),
            load_all_categories(),
            load_all_permissions(),
        )
        self._loaded = True
    
    @property
    async def configs(self):
        await self.ensure_loaded()
        return self._configs

loader = LazyLoader()

@app.on_event("startup")
async def startup():
    # 启动时什么都不加载
    pass

@app.get("/configs")
async def get_configs():
    # 首次请求时才加载
    return await loader.configs

启动耗时从6秒降到0秒，首次请求耗时略增（但后续请求不受影响）。

总结

优化项	影响程度	实现难度	优先级
数据库连接池	⭐⭐⭐⭐⭐	简单	🔴 最高
N+1查询	⭐⭐⭐⭐⭐	简单	🔴 最高
异步I/O	⭐⭐⭐⭐	中等	🟠 高
缓存	⭐⭐⭐⭐	简单	🟠 高
游标分页	⭐⭐⭐	简单	🟡 中
Pydantic序列化	⭐⭐	简单	🟢 低
中间件优化	⭐⭐	中等	🟢 低
启动优化	⭐	简单	🟢 低

实施建议：先做连接池和N+1查询，立竿见影；再做异步和缓存；其他的按需优化。

有问题欢迎评论区讨论 👇

Python FastAPI性能优化实战：8个让你的API快3倍的技巧