在第一篇中,你学会了如何构建一个基础的 Agent。现在,我们要进一步:如何设计可扩展、可维护、生产级别的 Agent 系统。这涉及三个关键领域:
- 自定义状态管理 - 超越简单的消息列表
- 中间件 - 在 Agent 执行的关键点进行拦截和定制
- 多Agent架构 - 多个智能体协作完成复杂任务
自定义State管理
超越 MessagesState 的场景
在第一篇中,我们使用了 MessagesState - 一个包含消息列表的简单状态。但现实中的应用往往更复杂。
什么时候你需要自定义 State?
✅ 你需要追踪对话之外的数据(用户偏好、会话统计、临时工作状态)
✅ 你想在 Agent 的不同节点之间共享非消息数据
✅ 你需要实现复杂的业务逻辑(订单处理、多步骤工作流)
✅ 你要支持不同类型的数据流(消息、事件、指标)
现实例子:
一个客服 Agent 不仅需要消息,还需要:
current_ticket_id- 当前处理的工单 IDuser_satisfaction_score- 用户满意度escalated_to_human- 是否已升级给人工conversation_history_summary- 会话摘要tools_used_count- 使用工具的次数(用于监控和成本追踪)
定义业务特有的状态结构
使用 Pydantic 定义类型安全的状态:
Python:
python
from typing import Annotated, Sequence
from pydantic import BaseModel, Field
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
# 定义自定义消息
class TicketStatus(BaseModel):
ticket_id: str
status: str # "open", "waiting_for_customer", "resolved"
priority: int # 1-5
category: str # "billing", "technical", "general"
# 定义完整的 Agent State
class CustomerSupportAgentState(BaseModel):
"""完整的客服 Agent 状态"""
# 消息(使用 add_messages 自动合并)
messages: Annotated[Sequence[BaseMessage], add_messages]
# 业务特有字段
ticket: TicketStatus
customer_id: str
customer_name: str
is_escalated: bool = False
escalation_reason: str = ""
# 元数据和统计
tool_calls_count: int = 0
satisfaction_score: float = 0.0 # 0-5
attempted_solutions: list[str] = Field(default_factory=list)
# 决策辅助
should_offer_callback: bool = False
TypeScript:
typescript
import * as z from "zod";
import { MessagesZodState } from "@langchain/langgraph";
const TicketStatusSchema = z.object({
ticketId: z.string(),
status: z.enum(["open", "waiting_for_customer", "resolved"]),
priority: z.number().min(1).max(5),
category: z.enum(["billing", "technical", "general"]),
});
const CustomerSupportAgentState = z.object({
...MessagesZodState.shape,
// 业务特有字段
ticket: TicketStatusSchema,
customerId: z.string(),
customerName: z.string(),
isEscalated: z.boolean().default(false),
escalationReason: z.string().default(""),
// 元数据和统计
toolCallsCount: z.number().default(0),
satisfactionScore: z.number().min(0).max(5).default(0),
attemptedSolutions: z.array(z.string()).default([]),
// 决策辅助
shouldOfferCallback: z.boolean().default(false),
});
状态流转最佳实践
原则 1:明确的输入-处理-输出
python
def process_customer_feedback(state: CustomerSupportAgentState) -> dict:
"""处理客户反馈的节点"""
# 输入:从 state 读取消息和现有得分
last_message = state["messages"][-1]
current_score = state["satisfaction_score"]
# 处理:提取新评分(通过 LLM)
llm = ChatOpenAI(model="gpt-4o-mini")
extraction_prompt = f"""
Based on this customer message, extract satisfaction score (0-5):
{last_message.content}
Current score: {current_score}
"""
response = llm.invoke(extraction_prompt)
new_score = float(response.content)
# 输出:返回对 state 的更新
return {
"satisfaction_score": new_score,
"should_offer_callback": new_score >= 4 # 高度满意时提供回调
}
原则 2:使用 reducer 管理列表字段
当你有需要不断累积的字段时(如 attempted_solutions),使用 reducer:
python
from langgraph.graph.message import add_messages
from typing import Annotated
def append_solution(current: list[str], new: list[str]) -> list[str]:
"""将新解决方案追加到列表,避免重复"""
return list(set(current + new)) # 使用 set 去重
class AgentState(BaseModel):
messages: Annotated[Sequence[BaseMessage], add_messages]
# attempted_solutions 会自动使用 append_solution reducer
attempted_solutions: Annotated[list[str], append_solution] = Field(default_factory=list)
原则 3:状态验证和转换
python
class AgentState(BaseModel):
messages: Annotated[Sequence[BaseMessage], add_messages]
ticket: TicketStatus
@field_validator('satisfaction_score')
@classmethod
def validate_score(cls, v):
"""确保得分在有效范围内"""
if not 0 <= v <= 5:
raise ValueError('Score must be between 0 and 5')
return round(v, 1) # 保留一位小数
def should_escalate(self) -> bool:
"""基于 state 的业务逻辑"""
# 高优先级 + 低满意度 = 自动升级
return (
self.ticket.priority >= 4 and
self.satisfaction_score < 2 and
not self.is_escalated
)
中间件(Middleware)的力量
为什么需要中间件
想象你有 10 个 Agent,都需要:
- 根据消息数量动态选择 LLM(长对话用 GPT-4,短对话用 GPT-4 mini)
- 检测并过滤有害内容
- 统计 token 使用量(用于成本计算)
- 重试失败的工具调用
如果没有中间件,你得在 10 个不同的 Agent 里重复这些逻辑。中间件让你一次性定义,到处使用。
中间件的三个关键钩子:
请求进入
↓
[beforeModel] ← 在 LLM 调用前拦截
↓
LLM 调用
↓
[afterModel] ← 在 LLM 调用后拦截
↓
[wrapToolCall] ← 在工具执行前后拦截
↓
响应返回
拦截模型调用:beforeModel/afterModel Hook
动态 System Prompt:
python
from langchain.agents import create_agent
from langchain import createMiddleware
# 根据用户偏好动态修改系统提示
def dynamic_system_prompt_middleware():
def before_model(request):
"""在 LLM 调用前修改 system prompt"""
user_role = request.state.get("user_role", "user")
base_prompt = "You are a helpful customer support agent."
if user_role == "vip":
base_prompt += " Provide premium, personalized support."
elif user_role == "new":
base_prompt += " Be extra patient and provide detailed explanations."
request.system_prompt = base_prompt
return request
return {"before_model": before_model}
agent = create_agent(
model="gpt-4o",
tools=tools,
middleware=[dynamic_system_prompt_middleware()]
)
响应验证(Guardrails):
python
def response_validation_middleware():
def after_model(request, response):
"""在 LLM 回复后进行验证"""
# 检查是否包含承诺
if "guarantee" in response.content.lower():
# 添加警告消息
response.content += "\n[INTERNAL: This response contains a guarantee - requires manager approval]"
# 检查语气是否过于礼貌(可能表示 LLM 感到困惑)
politeness_markers = response.content.count("apologize") + response.content.count("sorry")
if politeness_markers > 3:
print(f"⚠️ HIGH POLITENESS ALERT: {politeness_markers} apologies in response")
return response
return {"after_model": after_model}
动态模型选择(成本/性能均衡)
这是中间件最实用的应用场景:
Python:
python
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
# 定义多个模型
gpt4_mini = ChatOpenAI(model="gpt-4o-mini", temperature=0) # 便宜快速
gpt4 = ChatOpenAI(model="gpt-4o", temperature=0) # 昂贵但强大
gpt4_turbo = ChatOpenAI(model="gpt-4-turbo", temperature=0) # 最强大但最昂贵
def dynamic_model_selection():
"""根据上下文选择最合适的模型"""
def wrap_model_call(request, handler):
messages = request.messages
message_count = len(messages)
# 策略 1:基于对话长度
if message_count > 20:
# 长对话 → 用强大的模型(可能有隐含信息)
selected_model = gpt4
elif message_count > 10:
# 中等对话 → 用 mini
selected_model = gpt4_mini
else:
# 短对话 → 用最便宜的
selected_model = gpt4_mini
# 策略 2:基于对话复杂性
# 检查最后一条消息的长度和标点(粗略的复杂性指标)
last_message = messages[-1].content if messages else ""
if len(last_message) > 500 or last_message.count("?") > 2:
# 复杂问题 → 用更强的模型
selected_model = gpt4
# 策略 3:基于用户等级(来自 context)
user_tier = request.context.get("user_tier", "standard")
if user_tier == "premium":
selected_model = gpt4 # VIP 用户总是用最好的模型
# 执行调用(使用选定的模型)
return handler({**request, "model": selected_model})
return {"wrap_model_call": wrap_model_call}
agent = create_agent(
model=gpt4_mini, # 默认模型
tools=tools,
middleware=[dynamic_model_selection()]
)
TypeScript:
typescript
import { createAgent, createMiddleware } from "langchain";
import { ChatOpenAI } from "@langchain/openai";
const gpt4Mini = new ChatOpenAI({ model: "gpt-4o-mini" });
const gpt4 = new ChatOpenAI({ model: "gpt-4o" });
const dynamicModelSelection = createMiddleware({
name: "DynamicModelSelection",
wrapModelCall: (request, handler) => {
const messageCount = request.messages.length;
let selectedModel = gpt4Mini;
if (messageCount > 20) {
selectedModel = gpt4;
} else if (messageCount > 10) {
selectedModel = gpt4Mini;
}
return handler({
...request,
model: selectedModel,
});
},
});
const agent = createAgent({
model: gpt4Mini,
tools,
middleware: [dynamicModelSelection],
});
成本计算:
python
# 假设:
# gpt-4o-mini: $0.15 per 1M input tokens, $0.60 per 1M output tokens
# gpt-4o: $5 per 1M input tokens, $15 per 1M output tokens
# 场景:处理 100 个平均 10 条消息的对话
# 如果总是用 gpt-4o:$50 成本
# 如果用动态选择:$10 成本(节省 80%)
工具执行错误处理
当工具失败时,默认行为是抛出异常。中间件可以让 Agent 更聪明地处理这些错误:
python
from langchain.agents import create_agent
from langchain_core.messages import ToolMessage
def tool_error_handling_middleware():
"""优雅地处理工具错误"""
def wrap_tool_call(request, handler):
try:
result = handler(request)
# 成功 → 返回正常结果
return result
except Exception as e:
# 失败 → 返回对话式的错误消息
error_message = f"""
I tried to use the {request.tool_call.name} tool, but encountered an error:
{str(e)[:200]} # 限制错误消息长度,避免上下文溢出
Let me try a different approach...
"""
return ToolMessage(
content=error_message,
tool_call_id=request.tool_call.id,
is_error=True # 标记为错误,LLM 会识别这一点
)
return {"wrap_tool_call": wrap_tool_call}
带重试的工具执行:
python
from tenacity import retry, stop_after_attempt, wait_exponential
def tool_retry_middleware():
"""自动重试失败的工具"""
def wrap_tool_call(request, handler):
@retry(
stop=stop_after_attempt(3), # 最多重试 3 次
wait=wait_exponential(multiplier=1, min=1, max=10), # 指数退避
)
def call_with_retry():
return handler(request)
try:
return call_with_retry()
except Exception as e:
# 所有重试都失败了
return ToolMessage(
content=f"Tool failed after 3 attempts: {str(e)}",
tool_call_id=request.tool_call.id,
is_error=True
)
return {"wrap_tool_call": wrap_tool_call}
实战案例:Guardrail 和内容过滤
完整的内容安全中间件:
python
import re
from profanity_check import contains_profanity
def safety_guardrails_middleware():
"""检测和过滤不安全内容"""
FORBIDDEN_TOPICS = ["payment details", "password", "credit card"]
DANGEROUS_OPERATIONS = ["delete all", "remove database", "drop table"]
def wrap_model_call(request, handler):
"""检查输入的安全性"""
messages = request.messages
last_message = messages[-1].content.lower()
# 检查 1:禁止主题
for topic in FORBIDDEN_TOPICS:
if topic in last_message:
return {
"content": "I cannot help with that request as it involves sensitive information.",
"safety_flagged": True
}
# 检查 2:危险操作
for operation in DANGEROUS_OPERATIONS:
if operation in last_message:
# 要求确认
request.state["require_confirmation"] = True
# 检查 3:骂人
if contains_profanity(last_message):
messages[-1].content = last_message.replace("*" * len(word) for word in FORBIDDEN_WORDS)
# 执行原始调用
return handler(request)
def after_model(request, response):
"""检查输出的安全性"""
response_text = response.content.lower()
# 检查 4:避免 Agent 做出过度承诺
if "guarantee" in response_text or "promise" in response_text:
response.content = response.content.replace(
"guarantee",
"[redacted - requires manager approval]"
)
# 检查 5:检测可能的信息泄露
if re.search(r'\b\d{4}-\d{4}-\d{4}-\d{4}\b', response_text): # 信用卡格式
raise ValueError("Response contains potential credit card numbers!")
return response
return {
"wrap_model_call": wrap_model_call,
"after_model": after_model
}
Multi-Agent 架构
子图(Subgraph)的概念
一个 Subgraph 就是一个完整的 Agent(或工作流),作为另一个 Agent 的节点被使用。
为什么需要 Subgraph?
- 上下文隔离:让不同的 Agent 维护独立的消息历史
- 代码重用:一个 Agent 可以在多个系统中使用
- 团队合作:一个团队负责"支付处理 Agent",另一个团队负责"订单 Agent"
- 扩展性:添加新的 Agent 不需要修改现有 Agent
图像化:
主 Agent(主管)
├─ 工具 1:查询库存
├─ 工具 2:处理支付(实际上是一个 Subgraph!)
│ └─ 支付 Subgraph
│ ├─ 验证卡
│ ├─ 处理交易
│ └─ 返回结果
└─ 工具 3:生成发票
Agent 间的通信与数据传递
模式 1:Subgraph 有不同的 State
当 Subgraph 需要完全独立的状态时:
python
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
# 主 Agent 的 State
class MainAgentState(TypedDict):
messages: list
order_id: str
customer_id: str
# 支付处理 Subgraph 的 State(完全不同)
class PaymentSubgraphState(TypedDict):
payment_messages: list
amount: float
card_token: str
payment_status: str # "pending", "completed", "failed"
# 支付 Subgraph
def create_payment_subgraph():
builder = StateGraph(PaymentSubgraphState)
def validate_payment(state: PaymentSubgraphState):
# 验证卡
return {"payment_status": "processing"}
def process_transaction(state: PaymentSubgraphState):
# 处理交易
return {
"payment_status": "completed",
"payment_messages": state["payment_messages"] + ["Payment successful"]
}
builder.add_node("validate", validate_payment)
builder.add_node("process", process_transaction)
builder.add_edge(START, "validate")
builder.add_edge("validate", "process")
builder.add_edge("process", END)
return builder.compile()
payment_subgraph = create_payment_subgraph()
# 主 Agent
def create_main_agent():
builder = StateGraph(MainAgentState)
def handle_payment(state: MainAgentState):
"""从主 Agent 调用 Subgraph"""
# 转换状态:主 Agent State → Subgraph State
payment_input = {
"payment_messages": [],
"amount": 99.99,
"card_token": "tok_visa",
}
# 调用 Subgraph
payment_result = payment_subgraph.invoke(payment_input)
# 转换回来:Subgraph State → 主 Agent State
return {
"messages": state["messages"] + [
{"role": "ai", "content": f"Payment {payment_result['payment_status']}"}
]
}
def process_order(state: MainAgentState):
# 继续处理订单...
return {"messages": state["messages"] + [{"role": "ai", "content": "Order processed"}]}
builder.add_node("payment", handle_payment)
builder.add_node("order", process_order)
builder.add_edge(START, "payment")
builder.add_edge("payment", "order")
builder.add_edge("order", END)
return builder.compile()
模式 2:Subgraph 共享 State Key
当主 Agent 和 Subgraph 需要共享消息历史时:
python
class SharedState(TypedDict):
messages: Annotated[list, add_messages] # 共享
conversation_context: str # 共享
# 主 Agent 特有
order_id: str
# 支付 Subgraph 特有
transaction_id: str
# 支付 Subgraph(直接使用共享 State)
payment_subgraph = StateGraph(SharedState)
payment_subgraph.add_node("process_payment", lambda s: {
"messages": s["messages"] + [{"role": "ai", "content": "Processing..."}],
"transaction_id": "txn_123"
})
# 主 Agent 直接添加 Subgraph 作为节点
main_agent = StateGraph(SharedState)
main_agent.add_node("payment", payment_subgraph.compile()) # ← 直接添加 Subgraph!
同一线程多 Agent 运行(Thread 复用)
一个 thread_id 可以被多个不同的 Agent 使用,它们会共享相同的对话历史:
python
from langgraph_sdk import get_client
client = get_client(url="http://localhost:2024")
# 创建一个线程(对话)
thread = await client.threads.create()
thread_id = thread["thread_id"]
# Agent 1:数据检索 Agent
retrieval_agent = "retrieval_agent"
result1 = await client.runs.create(
thread_id=thread_id,
assistant_id=retrieval_agent,
input={"message": "Find customer account details"}
)
# Agent 2:支付处理 Agent(在同一线程中)
payment_agent = "payment_agent"
result2 = await client.runs.create(
thread_id=thread_id,
assistant_id=payment_agent,
input={"message": "Process payment"} # ← 可以看到 Agent 1 的结果!
)
# 获取完整的对话历史
final_state = await client.threads.get_state(thread_id)
print(final_state["messages"]) # 包含两个 Agent 的所有消息
上下文隔离策略
问题:在多 Agent 系统中,一个 Agent 的冗长消息历史会"污染"另一个 Agent 的上下文。
解决方案:使用消息过滤 Middleware:
python
def context_isolation_middleware(max_context_messages=10):
"""限制每个 Agent 看到的消息数量"""
def wrap_model_call(request, handler):
messages = request.messages
# 只保留最后 N 条消息 + 第一条(系统上下文)
if len(messages) > max_context_messages:
isolated_messages = [
messages[0], # 保留第一条(通常是初始上下文)
{"role": "assistant", "content": f"... ({len(messages) - max_context_messages - 1} messages omitted) ..."},
*messages[-max_context_messages:] # 最后 N 条
]
else:
isolated_messages = messages
return handler({**request, "messages": isolated_messages})
return {"wrap_model_call": wrap_model_call}
# 使用
agent = create_agent(
model="gpt-4o",
tools=tools,
middleware=[context_isolation_middleware(max_context_messages=5)]
)
内存与上下文管理
短期内存(会话历史)
短期内存就是 State 中的 messages 列表。它自动被 LangGraph 持久化和管理。
python
# 跨多个调用维护会话
config = {"configurable": {"thread_id": "user_123_session_1"}}
# 调用 1
result1 = graph.invoke(
{"messages": [HumanMessage(content="Hi, I'm interested in the MacBook Pro")]},
config=config
)
# 调用 2(相同的 thread_id)- Agent 会记得调用 1
result2 = graph.invoke(
{"messages": [HumanMessage(content="What's the price?")]},
config=config
)
# Agent 会看到完整的对话历史
print(result2["messages"])
# [
# HumanMessage("Hi, I'm interested..."),
# AIMessage("Great choice! The MacBook Pro..."),
# HumanMessage("What's the price?"),
# AIMessage("The price is...")
# ]
长期内存的引入(LangSmith Store)
长期内存用于跨会话保留信息(用户偏好、订单历史等)。
Python:
python
from langgraph.store.memory import InMemoryStore
from langchain.agents import create_agent
from langchain.tools import tool, ToolRuntime
from dataclasses import dataclass
store = InMemoryStore()
@dataclass
class Context:
user_id: str
# 在初始化时写入内存
store.put(
("users", "user_123"), # namespace, key
{
"name": "Alice Johnson",
"purchase_history": ["MacBook Pro", "iPad"],
"preferences": {"color": "silver", "storage": "512GB"},
"loyalty_tier": "gold"
}
)
@tool
def get_user_preferences(runtime: ToolRuntime[Context]) -> str:
"""获取用户的长期偏好"""
user_id = runtime.context.user_id
user_data = store.get(("users",), user_id)
if user_data:
return f"""
Name: {user_data.value['name']}
Loyalty Tier: {user_data.value['loyalty_tier']}
Preferred Color: {user_data.value['preferences']['color']}
"""
return "User not found"
@tool
def update_purchase_history(runtime: ToolRuntime[Context], item: str) -> str:
"""更新购买历史"""
user_id = runtime.context.user_id
user_data = store.get(("users",), user_id)
if user_data:
user_data.value["purchase_history"].append(item)
store.put(("users",), user_id, user_data.value)
return f"Added {item} to purchase history"
return "User not found"
agent = create_agent(
model="gpt-4o",
tools=[get_user_preferences, update_purchase_history],
store=store,
context_schema=Context
)
# 使用
result = agent.invoke(
{"messages": [{"role": "user", "content": "What do I usually buy?"}]},
context=Context(user_id="user_123")
)
跨线程搜索内存:
python
# 获取用户的所有内存
memories = store.search(
("users", "user_123"),
query="payment method", # 语义搜索
filter={"type": "financial"}
)
for memory in memories:
print(f"Found: {memory.value}")
防止上下文溢出
当消息太多时,LLM 的上下文窗口会溢出。解决方案:
1. 消息压缩
python
def compress_messages_middleware():
"""定期将旧消息压缩成摘要"""
def before_model(request):
messages = request.messages
# 如果消息超过 50 条,压缩前 30 条
if len(messages) > 50:
# 用 LLM 生成摘要
messages_to_compress = messages[:30]
summary_prompt = f"Summarize this conversation:\n{messages_to_compress}"
summary = llm.invoke(summary_prompt).content
# 替换原始消息
messages = [
SystemMessage(f"Previous conversation summary:\n{summary}"),
*messages[30:]
]
return {**request, "messages": messages}
return {"before_model": before_model}
2. 滑动窗口
python
def sliding_window_middleware(window_size=10):
"""只保持最后 N 条消息 + 系统消息"""
def before_model(request):
messages = request.messages
if len(messages) > window_size:
# 保留系统消息和最后 N 条
system_msgs = [m for m in messages if isinstance(m, SystemMessage)]
recent_msgs = messages[-window_size:]
messages = system_msgs + recent_msgs
return {**request, "messages": messages}
return {"before_model": before_model}
性能与成本优化
流式处理(Streaming)基础
而不是等待 Agent 完成所有步骤,流式可以实时返回结果:
Python:
python
# 逐步返回 Agent 的消息
for chunk in graph.stream(
{"messages": [HumanMessage(content="Explain machine learning")]},
stream_mode="updates" # 只返回状态更新,而不是完整状态
):
# chunk 格式:({node_name}, {state_update})
for node, state_update in chunk:
if "messages" in state_update:
latest_msg = state_update["messages"][-1]
print(f"[{node}]: {latest_msg.content[:100]}...")
实时流式令牌:
python
# 在 LLM 调用时流式令牌
def stream_tokens_middleware():
def wrap_model_call(request, handler):
# 流式调用(如果模型支持)
for token_chunk in llm.stream(request.messages):
# 实时发送给客户端
send_to_client(token_chunk)
return handler(request)
return {"wrap_model_call": wrap_model_call}
Token 统计与监控
追踪成本和性能:
python
def token_tracking_middleware():
"""记录每个调用的 token 使用"""
def after_model(request, response):
if hasattr(response, "usage_metadata"):
input_tokens = response.usage_metadata.get("input_tokens", 0)
output_tokens = response.usage_metadata.get("output_tokens", 0)
# 计算成本(GPT-4)
cost = (input_tokens * 0.00003 + output_tokens * 0.0006) / 1000
# 记录
print(f"📊 Tokens: {input_tokens} in, {output_tokens} out | Cost: ${cost:.4f}")
# 发送到监控系统
monitor.track("agent.tokens", input_tokens + output_tokens)
monitor.track("agent.cost", cost)
return response
return {"after_model": after_model}
批处理策略
处理多个请求时,批处理可以显著降低成本:
python
from concurrent.futures import ThreadPoolExecutor
def batch_process_customers(customer_list: list[str]):
"""批量处理多个客户"""
# 创建批处理任务
tasks = [
{
"messages": [HumanMessage(f"Process customer {cid}")],
"config": {"configurable": {"thread_id": f"batch_{i}"}}
}
for i, cid in enumerate(customer_list)
]
# 并行执行
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(
lambda task: graph.invoke(task["messages"], config=task["config"]),
tasks
))
return results
案例研究:Customer Support Agent
现在让我们把所有概念整合在一起,构建一个生产级的客服 Agent。
完整架构设计
用户消息
↓
[Input Validation Middleware] - 检查消息内容、检测滥用
↓
[主 Agent]
├─ 短期内存:消息历史
├─ 长期内存:客户档案、订单历史
├─ 工具 1:查询订单 (Subgraph)
├─ 工具 2:处理退货 (Subgraph)
├─ 工具 3:升级问题
└─ 工具 4:生成发票
↓
[Output Guardrails] - 检查回复,过滤敏感信息
↓
[Token Tracking] - 统计使用情况
↓
用户回复
State 设计与流转
python
from typing import Annotated, Sequence, Optional
from pydantic import BaseModel
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
class SupportTicket(BaseModel):
ticket_id: str
customer_id: str
status: str # "open", "in_progress", "waiting_customer", "resolved"
category: str # "billing", "technical", "account", "shipping"
priority: int # 1-5
created_at: str
last_updated: str
class CustomerProfile(BaseModel):
customer_id: str
name: str
email: str
account_created: str
lifetime_value: float
tier: str # "bronze", "silver", "gold", "platinum"
previous_issues: list[str] = []
class CustomerSupportAgentState(BaseModel):
"""完整的客服 Agent 状态"""
# 消息
messages: Annotated[Sequence[BaseMessage], add_messages]
# 当前工单
ticket: Optional[SupportTicket] = None
# 客户信息
customer_profile: Optional[CustomerProfile] = None
# 追踪决策
issue_category: str = ""
confidence_score: float = 0.0
requires_escalation: bool = False
escalation_reason: str = ""
# 元数据
conversation_start_time: str = ""
tool_calls_count: int = 0
customer_satisfaction: float = 0.0
工具集成(CRM、知识库)
python
from langchain.tools import tool, ToolRuntime
@tool
def query_customer_account(runtime: ToolRuntime) -> str:
"""查询客户账户信息"""
customer_id = runtime.context.get("customer_id")
# 模拟从 CRM 查询
customer_data = {
"customer_123": {
"name": "Alice Johnson",
"email": "alice@example.com",
"tier": "gold",
"account_age": "3 years",
"lifetime_value": "$15,000"
}
}
data = customer_data.get(customer_id)
if data:
# 存到长期内存
runtime.store.put(
("customers",),
customer_id,
data
)
return f"Account: {data['name']}, Tier: {data['tier']}, LTV: {data['lifetime_value']}"
return "Customer not found"
@tool
def search_knowledge_base(query: str) -> str:
"""在知识库中搜索相关文章"""
kb = {
"return policy": "Items can be returned within 30 days for a full refund...",
"shipping": "Standard shipping takes 5-7 business days...",
"password reset": "Click 'Forgot Password' on the login page...",
}
for key, content in kb.items():
if query.lower() in key.lower():
return content
return "No matching articles found"
@tool
def escalate_to_agent(reason: str, runtime: ToolRuntime) -> str:
"""升级给人工客服"""
ticket_id = runtime.state.get("ticket", {}).get("ticket_id")
# 记录升级
runtime.store.put(
("escalations",),
ticket_id,
{
"reason": reason,
"timestamp": str(datetime.now()),
"assigned_agent": "agent_team"
}
)
return f"Escalated to human agent. Your ticket ID is {ticket_id}. An agent will contact you shortly."
tools = [
query_customer_account,
search_knowledge_base,
escalate_to_agent,
]
中间件在生产中的应用
python
from langchain.agents import create_agent
# 1. 安全中间件
def safety_middleware():
def wrap_model_call(request, handler):
# 检查敏感词
for msg in request.messages:
if "credit card" in msg.content.lower():
msg.content = msg.content.replace("credit card", "[redacted]")
return handler(request)
return {"wrap_model_call": wrap_model_call}
# 2. 动态模型选择
def smart_model_selection():
gpt4_mini = ChatOpenAI(model="gpt-4o-mini")
gpt4 = ChatOpenAI(model="gpt-4o")
def wrap_model_call(request, handler):
ticket = request.state.get("ticket")
# VIP 客户 + 高优先级 → 用最好的模型
if (request.state.get("customer_profile", {}).get("tier") == "platinum" and
ticket and ticket.priority >= 4):
model = gpt4
else:
model = gpt4_mini
return handler({**request, "model": model})
return {"wrap_model_call": wrap_model_call}
# 3. 反馈收集
def satisfaction_tracking_middleware():
def after_model(request, response):
# 定期询问满意度
if len(request.messages) % 5 == 0: # 每 5 条消息
response.content += "\n\n*How satisfied are you with my help? (1-5)*"
return response
return {"after_model": after_model}
# 创建 Agent
agent = create_agent(
model=ChatOpenAI(model="gpt-4o-mini"),
tools=tools,
system_prompt="""
You are a helpful customer support agent.
- Be empathetic and professional
- Try to resolve issues yourself before escalating
- For VIP customers (tier=platinum), provide premium support
- Always maintain context from customer profile
""",
store=InMemoryStore(),
middleware=[
safety_middleware(),
smart_model_selection(),
satisfaction_tracking_middleware(),
]
)
完整流程:
python
# 初始化
config = {"configurable": {"thread_id": "ticket_001"}}
customer_id = "customer_123"
# 第一条消息
result = agent.invoke(
{
"messages": [HumanMessage(content="I can't log into my account")],
"ticket": SupportTicket(
ticket_id="ticket_001",
customer_id=customer_id,
status="open",
category="account",
priority=2,
created_at=str(datetime.now()),
last_updated=str(datetime.now())
),
},
config=config
)
# Agent 会:
# 1. 查询客户档案(存入长期内存)
# 2. 在知识库中搜索"password reset"
# 3. 根据客户等级选择模型
# 4. 生成帮助信息
# 5. 追踪 token 使用和满意度
print(result["messages"][-1].content)
总结与最佳实践
自定义 State:
✅ 只添加真正需要的字段
✅ 使用类型提示保证类型安全
✅ 添加验证和业务逻辑
中间件:
✅ 一次定义,多处使用
✅ 保持中间件简洁(单一责任)
✅ 组合多个中间件解决复杂问题
Multi-Agent:
✅ Subgraph 用于上下文隔离
✅ 共享 State Key 用于数据交互
✅ Thread 复用实现真正的多 Agent 对话
内存和性能:
✅ 短期内存(消息)自动管理
✅ 长期内存(Store)跨会话保留
✅ 防止上下文溢出的多种策略
✅ 流式、批处理优化性能
下一步: 进入第三篇,学习 LangSmith 可观测性和评估,让你的 Agent 更智能、更可靠。
相关文档: