从零到部署:构建生产级 AI Agent 的完整指南
摘要:本文详细介绍如何从零开始构建一个生产级的 AI Agent 系统。我们将涵盖 Agent 架构设计、核心组件实现、工具集成、记忆管理、评估优化以及最终部署。通过完整的代码示例和实战案例,读者将能够独立完成一个可投入生产的 AI Agent 系统。文章基于 2026 年最新技术栈,包括多模型路由、RAG 增强、工具调用等核心能力。
第一章:理解 AI Agent 的核心概念
1.1 什么是 AI Agent?
AI Agent(人工智能体)是一个能够感知环境、做出决策并执行动作的智能系统。与传统的大语言模型不同,Agent 具备以下核心能力:
- 感知能力:通过传感器或 API 获取环境信息
- 决策能力:基于目标和当前状态制定行动计划
- 执行能力:调用工具或 API 完成具体任务
- 学习能力:从反馈中优化行为策略
2026 年的 AI Agent 已经从简单的对话机器人演变为能够独立完成复杂任务的自主系统。根据 Stanford HAI 的最新研究,现代 Agent 系统可以分为三个层次:
- 基础层:单轮对话,无记忆
- 进阶层:多轮对话,短期记忆,简单工具调用
- 生产层:长期记忆,复杂规划,多工具协作,自我反思
1.2 Agent 的核心架构
一个典型的生产级 Agent 包含以下核心组件:
┌─────────────────────────────────────────────────────────┐
│ User Input │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Orchestrator (调度器) │
│ - 意图识别 - 任务分解 - 流程控制 │
└─────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Memory │ │ Tools │ │ Planner │
│ (记忆) │ │ (工具) │ │ (规划) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└───────────────┼───────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ LLM Core (大模型核心) │
│ 支持多模型路由、fallback、流式输出 │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Output │
│ (响应/动作/工具调用) │
└─────────────────────────────────────────────────────────┘
1.3 为什么需要生产级 Agent?
许多开发者在学习阶段使用简单的 Agent 框架,但在生产环境中会遇到以下问题:
- 稳定性不足:缺乏错误处理和重试机制
- 性能瓶颈:没有缓存和异步优化
- 成本失控:缺少 token 用量监控和模型路由
- 可维护性差:代码耦合严重,难以扩展
本文的目标就是帮助你避开这些陷阱,构建一个真正可用的生产级系统。
第二章:环境搭建与基础框架
2.1 技术栈选择
2026 年推荐的 Agent 开发技术栈:
- Python 3.11+:主流 AI 库支持最好的版本
- LangChain 0.3+:成熟的 Agent 框架
- FastAPI:高性能 API 服务
- Redis:缓存和消息队列
- PostgreSQL + pgvector:向量数据库
- Docker:容器化部署
2.2 项目结构
ai-agent-project/
├── src/
│ ├── __init__.py
│ ├── agent/
│ │ ├── __init__.py
│ │ ├── core.py # Agent 核心逻辑
│ │ ├── memory.py # 记忆管理
│ │ ├── planner.py # 任务规划
│ │ └── orchestrator.py # 调度器
│ ├── tools/
│ │ ├── __init__.py
│ │ ├── base.py # 工具基类
│ │ ├── search.py # 搜索工具
│ │ ├── calculator.py # 计算工具
│ │ └── api_client.py # API 调用工具
│ ├── models/
│ │ ├── __init__.py
│ │ ├── router.py # 模型路由
│ │ └── providers.py # 模型提供商
│ ├── storage/
│ │ ├── __init__.py
│ │ ├── vector_store.py # 向量存储
│ │ └── conversation.py # 对话存储
│ └── utils/
│ ├── __init__.py
│ ├── config.py # 配置管理
│ └── logging.py # 日志配置
├── tests/
├── docker/
├── requirements.txt
└── README.md
2.3 依赖安装
bash
# 创建虚拟环境
python3.11 -m venv venv
source venv/bin/activate
# 安装核心依赖
pip install langchain==0.3.0 langchain-core==0.3.0
pip install langchain-community==0.3.0
pip install fastapi==0.115.0 uvicorn==0.32.0
pip install redis==5.0.0 psycopg2-binary==2.9.9
pip install pgvector==0.3.0 sentence-transformers==3.0.0
pip install python-dotenv==1.0.0 pydantic==2.9.0
pip install httpx==0.27.0 tenacity==9.0.0
# 开发依赖
pip install pytest==8.3.0 pytest-asyncio==0.24.0
pip install black==24.10.0 ruff==0.7.0
2.4 配置文件
创建 .env 文件:
bash
# 模型配置
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
BAIDU_API_KEY=your_baidu_key
# 数据库配置
DATABASE_URL=postgresql://user:password@localhost:5432/agent_db
REDIS_URL=redis://localhost:6379/0
# 应用配置
APP_ENV=development
LOG_LEVEL=INFO
MAX_TOKENS=4096
第三章:核心组件实现
3.1 Agent 核心类
python
# src/agent/core.py
from typing import List, Optional, Dict, Any
from pydantic import BaseModel, Field
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.runnables import RunnableConfig
import asyncio
import logging
logger = logging.getLogger(__name__)
class AgentState(BaseModel):
"""Agent 运行时状态"""
conversation_id: str
messages: List[Dict[str, Any]] = Field(default_factory=list)
context: Dict[str, Any] = Field(default_factory=dict)
tool_calls: List[Dict[str, Any]] = Field(default_factory=list)
current_step: int = 0
max_steps: int = 10
class ProductionAgent:
"""生产级 AI Agent 核心类"""
def __init__(
self,
llm,
tools: List[Any],
memory: Any,
planner: Optional[Any] = None,
max_iterations: int = 10,
verbose: bool = True
):
self.llm = llm
self.tools = {tool.name: tool for tool in tools}
self.memory = memory
self.planner = planner
self.max_iterations = max_iterations
self.verbose = verbose
async def run(
self,
user_input: str,
conversation_id: str,
config: Optional[RunnableConfig] = None
) -> Dict[str, Any]:
"""执行 Agent 主循环"""
state = AgentState(
conversation_id=conversation_id,
messages=await self.memory.load(conversation_id)
)
# 添加用户消息
state.messages.append({
"role": "user",
"content": user_input
})
# 执行主循环
for iteration in range(self.max_iterations):
state.current_step = iteration
if self.verbose:
logger.info(f"Iteration {iteration + 1}/{self.max_iterations}")
# 生成响应
response = await self._generate_response(state)
# 检查是否需要工具调用
if response.get("tool_calls"):
tool_results = await self._execute_tools(
response["tool_calls"],
state
)
state.messages.append({
"role": "assistant",
"content": response.get("content", ""),
"tool_calls": response["tool_calls"]
})
state.messages.append({
"role": "tool",
"content": tool_results
})
else:
# 最终响应
final_response = response.get("content", "")
await self.memory.save(
conversation_id,
state.messages
)
return {
"response": final_response,
"conversation_id": conversation_id,
"iterations": iteration + 1,
"tool_calls": state.tool_calls
}
# 达到最大迭代次数
logger.warning(f"Agent reached max iterations for {conversation_id}")
return {
"response": "抱歉,我无法在限定步骤内完成这个任务。",
"conversation_id": conversation_id,
"iterations": self.max_iterations,
"error": "max_iterations_reached"
}
async def _generate_response(self, state: AgentState) -> Dict[str, Any]:
"""调用 LLM 生成响应"""
# 构建消息历史
messages = []
for msg in state.messages:
if msg["role"] == "user":
messages.append(HumanMessage(content=msg["content"]))
elif msg["role"] == "assistant":
messages.append(AIMessage(content=msg["content"]))
elif msg["role"] == "system":
messages.append(SystemMessage(content=msg["content"]))
# 调用 LLM
response = await self.llm.ainvoke(messages)
# 解析响应
return {
"content": response.content if hasattr(response, 'content') else str(response),
"tool_calls": self._parse_tool_calls(response)
}
def _parse_tool_calls(self, response) -> List[Dict[str, Any]]:
"""解析工具调用请求"""
# 简化实现,实际需要根据 LLM 响应格式解析
tool_calls = []
if hasattr(response, 'tool_calls') and response.tool_calls:
for tc in response.tool_calls:
tool_calls.append({
"name": tc.get('name', ''),
"arguments": tc.get('args', {})
})
return tool_calls
async def _execute_tools(
self,
tool_calls: List[Dict[str, Any]],
state: AgentState
) -> str:
"""执行工具调用"""
results = []
for tc in tool_calls:
tool_name = tc["name"]
if tool_name in self.tools:
try:
tool = self.tools[tool_name]
result = await tool.arun(**tc["arguments"])
results.append(f"{tool_name}: {result}")
state.tool_calls.append({
"name": tool_name,
"arguments": tc["arguments"],
"result": result
})
except Exception as e:
logger.error(f"Tool {tool_name} failed: {e}")
results.append(f"{tool_name}: Error - {str(e)}")
else:
results.append(f"Unknown tool: {tool_name}")
return "\n".join(results)
3.2 记忆管理系统
python
# src/agent/memory.py
from typing import List, Dict, Any, Optional
from datetime import datetime
import redis
import json
import logging
logger = logging.getLogger(__name__)
class ConversationMemory:
"""对话记忆管理"""
def __init__(self, redis_url: str, max_history: int = 50):
self.redis = redis.from_url(redis_url)
self.max_history = max_history
async def load(self, conversation_id: str) -> List[Dict[str, Any]]:
"""加载对话历史"""
key = f"conversation:{conversation_id}"
data = self.redis.get(key)
if data:
return json.loads(data)
return []
async def save(self, conversation_id: str, messages: List[Dict[str, Any]]):
"""保存对话历史"""
key = f"conversation:{conversation_id}"
# 限制历史长度
messages = messages[-self.max_history:]
self.redis.set(key, json.dumps(messages, ensure_ascii=False))
# 设置过期时间(7 天)
self.redis.expire(key, 7 * 24 * 60 * 60)
async def clear(self, conversation_id: str):
"""清空对话历史"""
key = f"conversation:{conversation_id}"
self.redis.delete(key)
class LongTermMemory:
"""长期记忆(向量存储)"""
def __init__(self, vector_store: Any, embedding_model: Any):
self.vector_store = vector_store
self.embedding_model = embedding_model
async def store(self, conversation_id: str, content: str, metadata: Dict[str, Any]):
"""存储重要信息到长期记忆"""
embedding = self.embedding_model.encode(content)
self.vector_store.add_vectors(
vectors=[embedding],
metadatas=[{
"conversation_id": conversation_id,
"content": content,
"timestamp": datetime.now().isoformat(),
**metadata
}]
)
async def search(self, query: str, top_k: int = 5) -> List[Dict[str, Any]]:
"""搜索相关记忆"""
query_embedding = self.embedding_model.encode(query)
results = self.vector_store.search(
query_vector=query_embedding,
top_k=top_k
)
return results
第四章:工具系统集成
4.1 工具基类设计
python
# src/tools/base.py
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional, List
from pydantic import BaseModel, Field
import logging
logger = logging.getLogger(__name__)
class ToolDefinition(BaseModel):
"""工具定义"""
name: str
description: str
parameters: Dict[str, Any]
class BaseTool(ABC):
"""工具基类"""
name: str = "base_tool"
description: str = "Base tool"
@abstractmethod
async def arun(self, **kwargs) -> Any:
"""异步执行工具"""
pass
def run(self, **kwargs) -> Any:
"""同步执行工具(默认调用异步版本)"""
import asyncio
return asyncio.run(self.arun(**kwargs))
def get_definition(self) -> ToolDefinition:
"""获取工具定义(用于 LLM 理解)"""
return ToolDefinition(
name=self.name,
description=self.description,
parameters=self._get_parameters_schema()
)
def _get_parameters_schema(self) -> Dict[str, Any]:
"""获取参数 Schema(子类可重写)"""
return {}
4.2 实用工具实现
python
# src/tools/search.py
from typing import Dict, Any
import httpx
from .base import BaseTool
class WebSearchTool(BaseTool):
"""网络搜索工具"""
name = "web_search"
description = "搜索互联网获取最新信息"
def __init__(self, api_key: str, engine: str = "google"):
self.api_key = api_key
self.engine = engine
self.base_url = "https://api.searchengine.com/search"
async def arun(self, query: str, num_results: int = 5) -> Dict[str, Any]:
"""执行搜索"""
async with httpx.AsyncClient() as client:
response = await client.get(
self.base_url,
params={
"q": query,
"num": num_results,
"key": self.api_key
},
timeout=30.0
)
response.raise_for_status()
data = response.json()
return {
"query": query,
"results": [
{
"title": item.get("title", ""),
"url": item.get("url", ""),
"snippet": item.get("snippet", "")
}
for item in data.get("results", [])[:num_results]
]
}
def _get_parameters_schema(self) -> Dict[str, Any]:
return {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索关键词"
},
"num_results": {
"type": "integer",
"description": "返回结果数量",
"default": 5
}
},
"required": ["query"]
}
# src/tools/calculator.py
from typing import Union
from .base import BaseTool
class CalculatorTool(BaseTool):
"""计算器工具"""
name = "calculator"
description = "执行数学计算"
async def arun(self, expression: str) -> Union[float, str]:
"""执行计算"""
try:
# 安全的表达式计算
result = eval(expression, {"__builtins__": {}}, {
"abs": abs, "round": round, "pow": pow,
"sum": sum, "min": min, "max": max
})
return {"expression": expression, "result": result}
except Exception as e:
return {"error": str(e)}
def _get_parameters_schema(self) -> Dict[str, Any]:
return {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "数学表达式,如 '2 + 2 * 3'"
}
},
"required": ["expression"]
}
4.3 工具注册与发现
python
# src/tools/__init__.py
from typing import List
from .base import BaseTool
from .search import WebSearchTool
from .calculator import CalculatorTool
def get_default_tools(config: dict) -> List[BaseTool]:
"""获取默认工具列表"""
tools = [
CalculatorTool(),
WebSearchTool(api_key=config.get("SEARCH_API_KEY", ""))
]
return tools
def get_tool_definitions(tools: List[BaseTool]) -> List[Dict[str, Any]]:
"""获取所有工具定义(用于提示工程)"""
return [tool.get_definition().dict() for tool in tools]
第五章:模型路由与优化
5.1 多模型路由策略
在生产环境中,单一模型无法满足所有场景需求。我们需要根据任务类型、成本预算和性能要求动态选择模型。
python
# src/models/router.py
from typing import Dict, Any, Optional, List
from enum import Enum
import logging
logger = logging.getLogger(__name__)
class ModelTier(Enum):
"""模型等级"""
PREMIUM = "premium" # 最高质量,如 GPT-4, Claude-3-Opus
STANDARD = "standard" # 平衡质量与成本,如 GPT-3.5, Claude-3-Sonnet
ECONOMY = "economy" # 低成本,如国产模型、小模型
class ModelRouter:
"""智能模型路由器"""
def __init__(self, config: Dict[str, Any]):
self.config = config
self.models = self._initialize_models()
self.fallback_chain = self._setup_fallback_chain()
def _initialize_models(self) -> Dict[str, Any]:
"""初始化可用模型"""
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
return {
"gpt-4-turbo": ChatOpenAI(
model="gpt-4-turbo-preview",
temperature=0.7,
max_tokens=4096
),
"gpt-3.5-turbo": ChatOpenAI(
model="gpt-3.5-turbo-0125",
temperature=0.7,
max_tokens=4096
),
"claude-3-opus": ChatAnthropic(
model="claude-3-opus-20240229",
temperature=0.7,
max_tokens=4096
),
"claude-3-sonnet": ChatAnthropic(
model="claude-3-sonnet-20240229",
temperature=0.7,
max_tokens=4096
),
"qwen-plus": ChatOpenAI(
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
model="qwen-plus",
temperature=0.7,
max_tokens=4096
)
}
def select_model(self, task_type: str, complexity: str = "medium") -> str:
"""根据任务类型选择模型"""
# 简单任务:经济型模型
if complexity == "low":
return "gpt-3.5-turbo"
# 复杂推理:高级模型
if task_type in ["reasoning", "coding", "analysis"]:
return "gpt-4-turbo"
# 创意写作:Claude 系列
if task_type in ["creative", "writing"]:
return "claude-3-sonnet"
# 默认:标准模型
return "gpt-3.5-turbo"
async def invoke_with_fallback(
self,
messages: List[Any],
primary_model: Optional[str] = None,
**kwargs
) -> Any:
"""带降级机制的模型调用"""
models_to_try = self.fallback_chain if not primary_model else [primary_model] + self.fallback_chain
last_error = None
for model_name in models_to_try:
try:
logger.info(f"Trying model: {model_name}")
model = self.models.get(model_name)
if not model:
continue
response = await model.ainvoke(messages, **kwargs)
return response
except Exception as e:
logger.warning(f"Model {model_name} failed: {e}")
last_error = e
continue
# 所有模型都失败
raise Exception(f"All models failed. Last error: {last_error}")
def _setup_fallback_chain(self) -> List[str]:
"""设置降级链"""
return [
"gpt-4-turbo",
"claude-3-opus",
"gpt-3.5-turbo",
"claude-3-sonnet",
"qwen-plus"
]
5.2 性能优化技巧
5.2.1 响应缓存
python
# src/utils/cache.py
import hashlib
import json
from typing import Any, Optional
from datetime import datetime, timedelta
import redis
class ResponseCache:
"""LLM 响应缓存"""
def __init__(self, redis_url: str, ttl_hours: int = 24):
self.redis = redis.from_url(redis_url)
self.ttl = timedelta(hours=ttl_hours)
def _generate_key(self, messages: List[Any], model: str) -> str:
"""生成缓存键"""
content = json.dumps({
"messages": [str(m) for m in messages],
"model": model
}, sort_keys=True)
return f"cache:{hashlib.md5(content.encode()).hexdigest()}"
async def get(self, messages: List[Any], model: str) -> Optional[Any]:
"""获取缓存响应"""
key = self._generate_key(messages, model)
data = self.redis.get(key)
if data:
return json.loads(data)
return None
async def set(self, messages: List[Any], model: str, response: Any):
"""缓存响应"""
key = self._generate_key(messages, model)
self.redis.setex(
key,
int(self.ttl.total_seconds()),
json.dumps(response, ensure_ascii=False)
)
5.2.2 流式输出
python
async def stream_response(agent, user_input: str, conversation_id: str):
"""流式输出响应"""
from langchain_core.callbacks import AsyncCallbackHandler
class StreamHandler(AsyncCallbackHandler):
def __init__(self, send_callback):
self.send_callback = send_callback
async def on_llm_new_token(self, token: str, **kwargs):
await self.send_callback(token)
# 使用流式处理
async for chunk in agent.llm.astream(user_input):
yield chunk.content
第六章:评估与测试
6.1 评估指标体系
构建生产级 Agent 必须建立完善的评估体系:
| 指标类别 | 具体指标 | 目标值 |
|---|---|---|
| 准确性 | 任务完成率 | >90% |
| 准确性 | 事实正确率 | >95% |
| 性能 | 平均响应时间 | <3s |
| 性能 | P95 响应时间 | <5s |
| 成本 | 单次对话成本 | <¥0.1 |
| 用户体验 | 用户满意度 | >4.5/5 |
| 稳定性 | 服务可用性 | >99.9% |
6.2 自动化测试框架
python
# tests/test_agent.py
import pytest
import asyncio
from src.agent.core import ProductionAgent
from src.tools.calculator import CalculatorTool
class TestProductionAgent:
"""Agent 测试套件"""
@pytest.fixture
def agent(self):
"""创建测试用 Agent"""
from langchain_openai import ChatOpenAI
from src.agent.memory import ConversationMemory
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
tools = [CalculatorTool()]
memory = ConversationMemory("redis://localhost:6379/1")
return ProductionAgent(
llm=llm,
tools=tools,
memory=memory,
max_iterations=5
)
@pytest.mark.asyncio
async def test_simple_calculation(self, agent):
"""测试简单计算"""
result = await agent.run(
user_input="计算 2 + 2 * 3",
conversation_id="test_001"
)
assert result["response"]
assert "8" in result["response"] or result["iterations"] > 0
@pytest.mark.asyncio
async def test_conversation_memory(self, agent):
"""测试对话记忆"""
# 第一轮对话
await agent.run(
user_input="我的名字是大旺",
conversation_id="test_memory_001"
)
# 第二轮对话
result = await agent.run(
user_input="我叫什么名字?",
conversation_id="test_memory_001"
)
assert "大旺" in result["response"]
@pytest.mark.asyncio
async def test_max_iterations(self, agent):
"""测试最大迭代限制"""
result = await agent.run(
user_input="请执行一个无限循环的任务",
conversation_id="test_max_iter"
)
assert result["iterations"] <= agent.max_iterations
6.3 压力测试
python
# tests/test_load.py
import asyncio
import pytest
from typing import List
import time
class TestLoadPerformance:
"""压力测试套件"""
@pytest.mark.asyncio
async def test_concurrent_requests(self, agent):
"""测试并发请求处理能力"""
async def make_request(i: int):
start = time.time()
result = await agent.run(
user_input=f"测试请求 {i}",
conversation_id=f"load_test_{i}"
)
elapsed = time.time() - start
return {"request_id": i, "elapsed": elapsed, "success": True}
# 并发 10 个请求
tasks = [make_request(i) for i in range(10)]
results = await asyncio.gather(*tasks, return_exceptions=True)
# 检查成功率
successful = [r for r in results if isinstance(r, dict) and r.get("success")]
assert len(successful) >= 8 # 80% 成功率
# 检查平均响应时间
avg_time = sum(r["elapsed"] for r in successful) / len(successful)
assert avg_time < 5.0 # 平均响应时间<5 秒
第七章:部署与运维
7.1 Docker 容器化
dockerfile
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# 安装系统依赖
RUN apt-get update && apt-get install -y \
gcc \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# 复制依赖文件
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 复制应用代码
COPY src/ ./src/
COPY tests/ ./tests/
# 设置环境变量
ENV PYTHONPATH=/app
ENV APP_ENV=production
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import httpx; httpx.get('http://localhost:8000/health')"
# 启动命令
CMD ["uvicorn", "src.api.main:app", "--host", "0.0.0.0", "--port", "8000"]
7.2 Docker Compose 配置
yaml
# docker-compose.yml
version: '3.8'
services:
agent-api:
build: .
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://agent:password@db:5432/agent_db
- REDIS_URL=redis://redis:6379/0
- OPENAI_API_KEY=${OPENAI_API_KEY}
depends_on:
- db
- redis
restart: unless-stopped
db:
image: pgvector/pgvector:pg16
environment:
- POSTGRES_USER=agent
- POSTGRES_PASSWORD=password
- POSTGRES_DB=agent_db
volumes:
- postgres_data:/var/lib/postgresql/data
restart: unless-stopped
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
restart: unless-stopped
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/nginx/ssl
depends_on:
- agent-api
restart: unless-stopped
volumes:
postgres_data:
redis_data:
7.3 API 服务实现
python
# src/api/main.py
from fastapi import FastAPI, HTTPException, Depends
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional, Dict, Any
import logging
import uuid
from src.agent.core import ProductionAgent
from src.tools import get_default_tools
from src.agent.memory import ConversationMemory
from src.models.router import ModelRouter
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
app = FastAPI(
title="AI Agent API",
description="生产级 AI Agent 服务",
version="1.0.0"
)
# CORS 配置
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# 全局变量(生产环境建议使用依赖注入)
agent: Optional[ProductionAgent] = None
memory: Optional[ConversationMemory] = None
@app.on_event("startup")
async def startup_event():
"""应用启动时初始化"""
global agent, memory
from langchain_openai import ChatOpenAI
import os
# 初始化组件
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0.7)
tools = get_default_tools({
"SEARCH_API_KEY": os.getenv("SEARCH_API_KEY", "")
})
memory = ConversationMemory(
redis_url=os.getenv("REDIS_URL", "redis://localhost:6379/0")
)
# 创建 Agent
agent = ProductionAgent(
llm=llm,
tools=tools,
memory=memory,
max_iterations=10,
verbose=True
)
logger.info("Agent initialized successfully")
@app.on_event("shutdown")
async def shutdown_event():
"""应用关闭时清理"""
logger.info("Shutting down...")
class ChatRequest(BaseModel):
"""聊天请求"""
message: str
conversation_id: Optional[str] = None
class ChatResponse(BaseModel):
"""聊天响应"""
response: str
conversation_id: str
iterations: int
tool_calls: list = []
@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
"""聊天接口"""
if not agent:
raise HTTPException(status_code=503, detail="Service not ready")
# 生成或复用对话 ID
conversation_id = request.conversation_id or str(uuid.uuid4())
try:
result = await agent.run(
user_input=request.message,
conversation_id=conversation_id
)
return ChatResponse(
response=result["response"],
conversation_id=conversation_id,
iterations=result["iterations"],
tool_calls=result.get("tool_calls", [])
)
except Exception as e:
logger.error(f"Chat error: {e}")
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health_check():
"""健康检查"""
return {"status": "healthy", "agent_ready": agent is not None}
@app.get("/metrics")
async def get_metrics():
"""获取指标(生产环境应集成 Prometheus)"""
return {
"agent_initialized": agent is not None,
"memory_connected": memory is not None
}
第八章:最佳实践与常见问题
8.1 提示工程最佳实践
-
系统提示词设计
- 明确角色定位
- 定义行为边界
- 提供示例格式
-
上下文管理
- 限制历史长度
- 重要信息摘要
- 动态上下文注入
-
工具描述优化
- 清晰的功能说明
- 详细的参数描述
- 提供使用示例
8.2 常见问题排查
问题 1:Agent 陷入无限循环
症状:迭代次数达到上限,任务未完成
解决方案:
- 增加任务分解步骤
- 添加工具调用结果验证
- 设置更明确的终止条件
问题 2:工具调用失败率高
症状:频繁出现工具执行错误
解决方案:
- 完善错误处理逻辑
- 添加参数验证
- 提供友好的错误提示
问题 3:响应速度慢
症状:用户等待时间过长
解决方案:
- 启用响应缓存
- 使用流式输出
- 优化模型选择策略
8.3 成本优化策略
- 模型路由:简单任务使用经济型模型
- 响应缓存:重复查询直接返回缓存
- Token 优化:精简提示词,减少冗余
- 批量处理:合并多个请求
总结
本文详细介绍了从零开始构建生产级 AI Agent 的完整流程,涵盖:
- ✅ Agent 核心架构设计
- ✅ 环境搭建与项目结构
- ✅ 记忆管理与工具集成
- ✅ 模型路由与性能优化
- ✅ 评估测试与质量保障
- ✅ 容器化部署与运维
- ✅ 最佳实践与问题排查
通过本文的学习,你应该能够:
- 理解生产级 Agent 与实验性 Agent 的区别
- 独立搭建完整的 Agent 开发环境
- 实现核心组件并进行有效集成
- 建立完善的评估和测试体系
- 完成容器化部署并投入生产
AI Agent 技术正在快速发展,2026 年将是 Agent 大规模应用的关键一年。希望本文能帮助你在这个激动人心的领域取得成功!
参考链接
- LangChain 官方文档
- Stanford HAI - AI Agent 研究报告
- FastAPI 文档
- Docker 官方文档
- pgvector GitHub
- Redis 文档
- OpenAI API 文档
- Anthropic Claude 文档
版权声明:本文内容为原创,基于公开资料独立撰写。文中示例代码可自由使用于学习和个人项目。转载或引用请注明出处。
作者:超人不会飞
发布日期:2026 年 3 月 30 日