MCP Server 工程避坑指南：我踩过的 8 个生产级陷阱

本文面向已经了解 MCP 基础概念、准备或正在构建 MCP Server 的工程师。不会重复官方文档------每一节都是我在实际工程中碰过壁的地方。

Model Context Protocol（MCP）自 2024 年底某海外 AI 公司发布以来，以惊人的速度成为 AI Agent 工具接入的事实标准。到 2025 年底，GitHub 上已有数千个 MCP Server 实现，主流 AI 应用（主流 AI IDE 和桌面助手）都已原生支持。

但"生态繁荣"和"生产可用"之间，有一条很深的沟。

我在过去几个月里构建了 5 个线上运行的 MCP Server，踩过了工具定义爆 token、SSE 连接泄漏、并发安全、版本协商失败等一系列坑。这篇文章把最值得记录的 8 个问题系统梳理一遍，附上可以直接用的修复代码。

坑 1：工具定义太详细，token 爆了

问题现场

你连接了 20 个 MCP Server，每个暴露 30 个工具。MCP Client 把所有工具 schema 一起塞进 system prompt------这意味着模型在看用户第一句话之前，就已经消耗了 6 万 token 的工具描述。

官方 benchmark 数据显示：当工具数超过 100 个时，加载时间和推理成本呈超线性增长。

python 复制代码

# 反例：过度描述的工具定义
@server.call_tool()
async def handle_query_database(name: str, arguments: dict) -> list[types.TextContent]:
    ...

# tools 列表中这样注册：
types.Tool(
    name="query_database",
    description="""
    这是一个功能强大的数据库查询工具，支持 MySQL、PostgreSQL、SQLite 等多种数据库类型。
    你可以使用标准 SQL 语法进行查询，支持 JOIN、子查询、聚合函数、窗口函数等高级特性。
    查询结果以 JSON 格式返回，包含列名和行数据。支持分页，每次最多返回 1000 行。
    注意事项：请确保 SQL 语句安全，避免 SQL 注入...（以下省略 200 字）
    """,
    inputSchema={...}  # 又是一个复杂 JSON Schema
)

修复方案

原则：工具描述短于 50 字，schema 字段不超过 5 个。把细节移到 resource 里。

python 复制代码

# 正例：精简描述
types.Tool(
    name="query_database",
    description="执行 SQL 查询，返回 JSON 格式结果（最多 1000 行）。",
    inputSchema={
        "type": "object",
        "properties": {
            "sql": {
                "type": "string",
                "description": "SQL 查询语句"
            },
            "limit": {
                "type": "integer",
                "description": "最大返回行数，默认 100",
                "default": 100
            }
        },
        "required": ["sql"]
    }
)

同时，如果你的工具真的很复杂，考虑协议设计文档中提到的「代码执行模式」：让模型写一段调用工具的代码，由 sandbox 执行，中间结果不经过 context window------这在工具数量超过 200 个时能节省 60% 以上的 token 消耗。

坑 2：SSE 传输模式下的连接泄漏

问题现场

用 SSE（Server-Sent Events）模式部署 MCP Server 后，运行一周，服务器 fd（文件描述符）耗尽，进程崩溃。

原因：SSE 是长连接，Client 断开后 Server 端不一定能及时感知。加上部分 MCP Client 实现有 bug，会在重连时不关闭旧连接。

python 复制代码

# 危险写法：没有连接生命周期管理
from mcp.server.sse import SseServerTransport

app = Starlette()
sse = SseServerTransport("/messages")

@app.route("/sse")
async def handle_sse(request):
    async with sse.connect_sse(request.scope, request.receive, request._send) as streams:
        await server.run(streams[0], streams[1], InitializationOptions(...))
    # 问题：异常时 streams 可能不会被正确清理

修复方案

python 复制代码

import asyncio
import weakref
from contextlib import asynccontextmanager
from starlette.applications import Starlette
from starlette.routing import Route

# 用 weakref 跟踪活跃连接，避免内存泄漏
_active_connections: weakref.WeakSet = weakref.WeakSet()

@asynccontextmanager
async def managed_sse_connection(sse_transport, request):
    """带超时和清理的 SSE 连接管理器"""
    connection_id = id(request)
    try:
        async with asyncio.timeout(3600):  # 1小时超时
            async with sse_transport.connect_sse(
                request.scope, request.receive, request._send
            ) as streams:
                _active_connections.add(streams)
                yield streams
    except asyncio.TimeoutError:
        pass  # 正常超时，不报错
    except Exception as e:
        logger.error(f"SSE connection {connection_id} error: {e}")
        raise
    finally:
        logger.info(f"SSE connection {connection_id} cleaned up. "
                    f"Active: {len(_active_connections)}")

async def handle_sse(request):
    async with managed_sse_connection(sse, request) as streams:
        await server.run(
            streams[0], streams[1],
            InitializationOptions(server_name="my-server", server_version="1.0.0")
        )

同时，在 nginx/caddy 层加上连接超时配置：

nginx 复制代码

# nginx.conf
location /sse {
    proxy_pass http://localhost:8080;
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;
    # 关键：告诉 nginx 这是 SSE，不要缓冲
    proxy_buffering off;
    proxy_cache off;
    proxy_set_header X-Accel-Buffering no;
}

坑 3：并发工具调用的竞态条件

问题现场

MCP 协议本身支持并发请求（同一连接上多个 in-flight 的 JSON-RPC 请求）。当 AI Agent 同时发起 3 个工具调用时，如果你的 Server 代码共享了可变状态，就会出现数据竞争。

python 复制代码

# 危险：共享可变状态
class DatabaseServer:
    def __init__(self):
        self.connection = create_db_connection()  # 单个连接
        self.last_query_result = None  # 共享状态
    
    @server.call_tool()
    async def query(self, name, arguments):
        # 危险：多个并发请求共享 self.connection
        cursor = self.connection.cursor()
        cursor.execute(arguments["sql"])
        self.last_query_result = cursor.fetchall()  # 竞态！
        return self.last_query_result

修复方案

使用连接池，彻底消除共享可变状态：

python 复制代码

import asyncpg
from contextlib import asynccontextmanager

class DatabaseServer:
    def __init__(self):
        self._pool: asyncpg.Pool | None = None
    
    async def initialize(self):
        self._pool = await asyncpg.create_pool(
            dsn=os.getenv("DATABASE_URL"),
            min_size=2,
            max_size=10,
            command_timeout=30
        )
    
    @asynccontextmanager
    async def get_connection(self):
        """每次调用获取独立连接，用完归还池"""
        async with self._pool.acquire() as conn:
            yield conn
    
    async def handle_query(self, name: str, arguments: dict):
        async with self.get_connection() as conn:
            # 每个并发调用有自己的 conn，无竞态
            rows = await conn.fetch(arguments["sql"])
            return [types.TextContent(
                type="text",
                text=json.dumps([dict(row) for row in rows], ensure_ascii=False)
            )]

对于非数据库场景（如文件操作），使用 asyncio.Lock 或 asyncio.Semaphore 做细粒度控制：

python 复制代码

import asyncio

class FileServer:
    def __init__(self):
        self._write_locks: dict[str, asyncio.Lock] = {}
    
    def _get_lock(self, filepath: str) -> asyncio.Lock:
        if filepath not in self._write_locks:
            self._write_locks[filepath] = asyncio.Lock()
        return self._write_locks[filepath]
    
    async def handle_write_file(self, name: str, arguments: dict):
        path = arguments["path"]
        async with self._get_lock(path):  # 同一文件串行写
            async with aiofiles.open(path, "w") as f:
                await f.write(arguments["content"])
        return [types.TextContent(type="text", text=f"写入成功：{path}")]

坑 4：协议版本协商失败的沉默错误

问题现场

某次更新 MCP SDK 后，Server 无法与旧版 Client 建立连接，但没有任何错误日志------连接就这样安静地失败了。

原因：MCP 协议在 initialize 握手阶段有版本协商逻辑，如果 Client 和 Server 的 protocolVersion 不兼容，连接会被拒绝，但默认日志级别不会打印细节。

python 复制代码

# 查看握手请求的实际内容（调试用）
import logging
logging.basicConfig(level=logging.DEBUG)

# 在 server 初始化时打印协议版本
@server.list_tools()
async def handle_list_tools() -> list[types.Tool]:
    # 第一个被调用的方法之一，在这里打日志确认连接已建立
    logger.info("list_tools called --- connection established successfully")
    return [...]

修复方案

显式处理版本兼容性，并加入结构化日志：

python 复制代码

from mcp.server import Server
from mcp.types import InitializeResult, ServerCapabilities
import structlog

logger = structlog.get_logger()

# 显式声明支持的协议版本
SUPPORTED_PROTOCOL_VERSIONS = ["2024-11-05", "2025-03-26"]

async def create_server():
    server = Server("my-mcp-server")
    
    # 监控 initialize 事件
    original_handle = server._handle_initialize
    
    async def logged_initialize(params):
        client_version = params.protocolVersion
        logger.info(
            "mcp_initialize",
            client_version=client_version,
            supported_versions=SUPPORTED_PROTOCOL_VERSIONS,
            compatible=client_version in SUPPORTED_PROTOCOL_VERSIONS
        )
        
        if client_version not in SUPPORTED_PROTOCOL_VERSIONS:
            logger.warning(
                "mcp_version_mismatch",
                client_version=client_version,
                will_attempt="best-effort"
            )
        
        return await original_handle(params)
    
    server._handle_initialize = logged_initialize
    return server

同时在部署时锁定 SDK 版本，避免自动升级破坏生产环境：

toml 复制代码

# pyproject.toml
[project]
dependencies = [
    "mcp>=1.3.0,<2.0.0",  # 锁定主版本
]

坑 5：Resource URI 设计混乱，导致 LLM 无法正确引用

问题现场

把 Resource URI 设计成数据库自增 ID（resource://item/12345），之后数据库迁移后 ID 变了，所有历史对话中的资源引用全部失效。

更严重的问题：LLM 在引用 resource 时，如果 URI 不够语义化，它根本不知道该请求哪个 resource。

修复方案

原则：Resource URI 必须语义化、稳定、人类可读。

python 复制代码

# 反例：不透明 URI
"resource://db/12345"           # ID 是什么？
"resource://cache/a3f9b2"       # 哈希是什么？

# 正例：语义化 URI
"resource://github/repos/octocat/hello-world/readme"    # 明确
"resource://local/projects/myapp/src/main.py"           # 明确
"resource://jira/projects/PROJ/issues/PROJ-123"         # 明确

# Python 示例：动态生成语义化 URI
def make_resource_uri(resource_type: str, *path_components: str) -> str:
    """生成稳定的语义化 resource URI"""
    # 规范化路径组件，去掉特殊字符
    clean_parts = [
        re.sub(r'[^\w\-./]', '_', str(p)).strip('/')
        for p in path_components
    ]
    return f"resource://{resource_type}/{'/'.join(clean_parts)}"

# 用法
uri = make_resource_uri("github", "repos", owner, repo, "contents", filepath)
# → "resource://github/repos/octocat/hello-world/contents/README.md"

同时，在 list_resources 返回时加入 mimeType 提示，帮助 LLM 理解资源类型：

python 复制代码

@server.list_resources()
async def handle_list_resources() -> list[types.Resource]:
    return [
        types.Resource(
            uri=AnyUrl(make_resource_uri("local", "docs", "api.md")),
            name="API 文档",
            description="项目的 REST API 接口文档",
            mimeType="text/markdown"  # 关键：告诉 LLM 这是什么格式
        ),
        types.Resource(
            uri=AnyUrl(make_resource_uri("local", "data", "config.json")),
            name="配置文件",
            description="应用运行时配置",
            mimeType="application/json"
        ),
    ]

坑 6：工具返回值太大，上下文窗口撑爆

问题现场

工具返回了整个数据库表的内容（10MB JSON），MCP Client 把它塞进 context，下一轮对话直接 OOM 或 token 超限报错。

修复方案

分页 + 截断 + 摘要三层防护：

python 复制代码

from dataclasses import dataclass
from typing import Any

MAX_TOOL_RESULT_CHARS = 8000  # 约 2000 token

@dataclass
class PaginatedResult:
    data: list[Any]
    total: int
    page: int
    page_size: int
    has_more: bool

def truncate_result(result: Any, max_chars: int = MAX_TOOL_RESULT_CHARS) -> str:
    """智能截断工具结果，保留结构信息"""
    text = json.dumps(result, ensure_ascii=False, indent=2)
    
    if len(text) <= max_chars:
        return text
    
    # 截断并附加提示
    truncated = text[:max_chars]
    # 找到最后一个完整的 JSON 边界
    last_brace = max(truncated.rfind('}'), truncated.rfind(']'))
    if last_brace > max_chars * 0.8:
        truncated = truncated[:last_brace + 1]
    
    char_count = len(text)
    return (
        truncated + 
        f"\n\n... [结果已截断：共 {char_count} 字符，显示前 {len(truncated)} 字符。"
        f"使用 offset 参数获取更多结果。]"
    )

async def handle_query_database(name: str, arguments: dict):
    page = arguments.get("page", 1)
    page_size = min(arguments.get("page_size", 50), 200)  # 上限 200 行
    offset = (page - 1) * page_size
    
    async with db.get_connection() as conn:
        total = await conn.fetchval(
            f"SELECT COUNT(*) FROM ({arguments['sql']}) AS q"
        )
        rows = await conn.fetch(
            f"{arguments['sql']} LIMIT {page_size} OFFSET {offset}"
        )
    
    result = PaginatedResult(
        data=[dict(r) for r in rows],
        total=total,
        page=page,
        page_size=page_size,
        has_more=(offset + page_size) < total
    )
    
    output = truncate_result(dataclasses.asdict(result))
    return [types.TextContent(type="text", text=output)]

坑 7：stdio 模式下 print 调试语句破坏协议

问题现场

在 stdio 传输模式下，MCP 通过 stdout 发送 JSON-RPC 消息。如果你在代码里随手加了一句 print("调试信息")，这段文本会混入协议消息流，导致 Client 解析失败，表现为工具调用后没有响应，或者 Client 崩溃。

这个 bug 的坑在于：本地测试时 Client 如果有容错处理，可能正常工作，但换一个 Client 就崩了。

python 复制代码

# 致命错误：stdio 模式下禁止向 stdout 打印任何内容
@server.call_tool()
async def handle_search(name: str, arguments: dict):
    print(f"DEBUG: searching for {arguments['query']}")  # 💀 破坏协议！
    results = await do_search(arguments["query"])
    print(f"DEBUG: found {len(results)} results")        # 💀 同上
    return [types.TextContent(type="text", text=str(results))]

修复方案

所有日志必须写 stderr，永远不要用 print。

python 复制代码

import logging
import sys

# 在程序入口配置日志到 stderr
logging.basicConfig(
    stream=sys.stderr,          # 关键：输出到 stderr，不是 stdout
    level=logging.DEBUG,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)

logger = logging.getLogger(__name__)

@server.call_tool()
async def handle_search(name: str, arguments: dict):
    logger.debug("searching for: %s", arguments["query"])   # ✅ 安全
    results = await do_search(arguments["query"])
    logger.info("found %d results", len(results))           # ✅ 安全
    return [types.TextContent(type="text", text=str(results))]

如果你用的是第三方库，注意检查它们是否有意外的 stdout 输出：

python 复制代码

# 重定向整个 stdout，防止第三方库污染
import sys
import io

class StdoutGuard:
    """在 stdio MCP 模式下保护 stdout 不被意外写入"""
    
    def __init__(self, real_stdout):
        self._real_stdout = real_stdout
        self._buffer = io.StringIO()
    
    def write(self, text):
        # 只有 JSON-RPC 消息（以 {" 开头）才允许写 stdout
        if text.strip().startswith('{"'):
            self._real_stdout.write(text)
        else:
            # 重定向到 stderr 并报警
            sys.stderr.write(f"[STDOUT_GUARD] Intercepted: {repr(text[:100])}\n")
    
    def flush(self):
        self._real_stdout.flush()

# 在 main 函数顶部安装 guard
if os.getenv("MCP_TRANSPORT") == "stdio":
    sys.stdout = StdoutGuard(sys.stdout)

坑 8：没有健康检查和 Graceful Shutdown，部署出问题

问题现场

Kubernetes 中部署 SSE 模式的 MCP Server，Pod 重启时正在处理的工具调用直接被杀掉，Client 端报 "connection reset"。同时，没有 /health 端点，负载均衡器无法判断 Server 是否就绪。

修复方案

加入健康检查端点 + 优雅关闭逻辑：

python 复制代码

import asyncio
import signal
from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route

# 全局状态
_is_shutting_down = False
_active_request_count = 0

async def health_check(request):
    """Kubernetes liveness + readiness probe"""
    if _is_shutting_down:
        return JSONResponse(
            {"status": "shutting_down", "active_requests": _active_request_count},
            status_code=503
        )
    
    # 检查依赖项（数据库、外部 API 等）
    try:
        async with db.get_connection() as conn:
            await conn.fetchval("SELECT 1")
        db_healthy = True
    except Exception as e:
        db_healthy = False
        logger.error("health check db failed: %s", e)
    
    status = "healthy" if db_healthy else "degraded"
    code = 200 if db_healthy else 503
    
    return JSONResponse({
        "status": status,
        "active_requests": _active_request_count,
        "version": "1.0.0"
    }, status_code=code)

async def handle_sse_with_tracking(request):
    global _active_request_count
    
    if _is_shutting_down:
        return JSONResponse({"error": "server shutting down"}, status_code=503)
    
    _active_request_count += 1
    try:
        async with managed_sse_connection(sse, request) as streams:
            await server.run(streams[0], streams[1], init_options)
    finally:
        _active_request_count -= 1

def setup_graceful_shutdown(app):
    """设置优雅关闭：等待活跃请求完成"""
    
    async def shutdown_handler():
        global _is_shutting_down
        _is_shutting_down = True
        logger.info("Shutdown signal received, waiting for active requests...")
        
        # 最多等待 30 秒
        timeout = 30
        while _active_request_count > 0 and timeout > 0:
            logger.info("Waiting... active requests: %d", _active_request_count)
            await asyncio.sleep(1)
            timeout -= 1
        
        logger.info("Graceful shutdown complete")
    
    loop = asyncio.get_event_loop()
    
    def handle_signal(sig):
        logger.info("Received signal: %s", sig)
        loop.create_task(shutdown_handler())
    
    loop.add_signal_handler(signal.SIGTERM, handle_signal, signal.SIGTERM)
    loop.add_signal_handler(signal.SIGINT, handle_signal, signal.SIGINT)

app = Starlette(routes=[
    Route("/health", health_check),
    Route("/sse", handle_sse_with_tracking),
    Route("/messages", sse.handle_post_message, methods=["POST"]),
])

对应的 Kubernetes 配置：

yaml 复制代码

# deployment.yaml
spec:
  containers:
  - name: mcp-server
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 30
    readinessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 5"]  # 给负载均衡摘流时间
  terminationGracePeriodSeconds: 40  # > preStop + shutdown 时间

总结：MCP Server 生产就绪检查清单

把上面 8 个坑浓缩成一张可操作的清单：

检查项	验证方式
✅ 工具描述 < 50 字，schema 字段 ≤ 5 个	统计工具定义 token 数
✅ SSE 连接有超时和清理逻辑	压测 + 72h 连接观察
✅ 共享状态使用连接池或 Lock	asyncio 并发测试
✅ 协议版本有结构化日志	查 initialize 握手日志
✅ Resource URI 语义化且稳定	人工 review URI 设计
✅ 工具返回值有截断和分页	测试大结果集返回
✅ stdio 模式下无 print 语句	grep -r "print(" src/
✅ 有 /health 端点 + Graceful Shutdown	Kubernetes 探针测试

MCP 的协议设计是优雅的，但"优雅"和"生产可用"之间永远有工程债。希望这份踩坑记录能帮你少走一些弯路。

延伸思考：MCP 的边界在哪里？

在讨论生产坑之余，有一个架构层面的问题值得单独说：MCP 不是 Agent 的全部，别把它当银弹。

MCP vs Function Calling：不是竞争关系

很多工程师上手 MCP 后的第一个疑问：我原来用 Function Calling 好好的，为什么要换？

答案是：不需要换，两者解决的问题层次不同。

Function Calling 是 LLM 推理层的能力，决定「调什么工具、传什么参数」。这是各家大模型厂商各自实现的推理接口，API 格式不同。
MCP 是应用层的网络协议，决定「工具怎么被发现、怎么被连接、怎么被调用」。它标准化的是 Agent 宿主和工具服务器之间的通信方式。

两者的关系：Agent 执行任务 → LLM 通过 Function Calling 输出工具调用意图 → 宿主拿到意图后，通过 MCP 路由到对应的 Server → Server 执行并返回结果。

什么时候不该用 MCP？

以下场景用 MCP 是过度工程：

单应用、单模型：如果你的工具只给一个内部应用用，直接写函数就够了，引入 MCP Server 增加了网络跳转和维护成本。
高频低延迟场景：MCP 的 JSON-RPC 通信引入了序列化/反序列化开销。如果一个工具调用需要 < 10ms，本地函数调用比 MCP Server 快 10-100 倍。
工具逻辑高度耦合业务：如果工具需要访问大量内部状态，抽成独立 Server 反而带来状态同步的复杂性。

MCP 真正发光的场景是：工具被多个 AI 应用复用 （一份 MCP Server 让主流 AI IDE（Cursor、Windsurf 等）和自研 Agent 都能用），或者团队希望统一管理工具生命周期（版本、权限、日志集中在 Server 侧）。

2026 年的 MCP 生态现状

截至 2026 年上半年，MCP 生态已经相当成熟：

SDK：Python、TypeScript、Java、Go、Rust 均有官方或社区维护的 SDK
客户端支持：Cursor、VS Code Copilot、Windsurf、Zed 等 AI IDE 均原生支持
服务器数量：GitHub 上公开的 MCP Server 超过 5000 个，覆盖数据库、文件系统、SaaS API、代码工具等主流场景
企业采纳：Cloudflare、Stripe、Atlassian 等已发布官方 MCP Server

这意味着大量「基础设施级」的 MCP Server 已经由社区打磨完善，你自己需要写的更多是「业务逻辑层」的工具------这正是本文 8 个坑聚焦的区域。

关联阅读：

[LLM 网关的优雅降级设计：5 层 fallback chain 工程实践](#LLM 网关的优雅降级设计：5 层 fallback chain 工程实践 "#")
[AI Agent 开发中的模型调度策略](#AI Agent 开发中的模型调度策略 "#")
[Prompt Caching 工程实践：把 LLM 调用成本砍掉 80%](#Prompt Caching 工程实践：把 LLM 调用成本砍掉 80% "#")