注入灵魂：记忆管理与工具调用

文章目录

[1. 对话历史管理 (Memory)](#1. 对话历史管理 (Memory))
- [1.1 状态管理与 Prompt 结合](#1.1 状态管理与 Prompt 结合)
- [1.2 关键技巧：消息裁剪 (Trim Messages)](#1.2 关键技巧：消息裁剪 (Trim Messages))
[2. 工具调用 (Tool Calling)](#2. 工具调用 (Tool Calling))
- [2.1 定义工具：文档与类型提示是灵魂](#2.1 定义工具：文档与类型提示是灵魂)
- [2.2 绑定工具 (Bind Tools)](#2.2 绑定工具 (Bind Tools))
- [2.3 手动执行工具 (Tool Execution) - **Agent 的底层逻辑**](#2.3 手动执行工具 (Tool Execution) - Agent 的底层逻辑)
总结

核心痛点 ：大模型本身是无状态的（没记性），也是封闭的（没手脚，无法联网或查库）。本篇将教你如何通过 Memory 让模型记住上下文，并通过 Tool Calling 让模型与外部世界交互。

学习目标：

掌握 RunnableWithMessageHistory 实现多轮对话。

学会使用 trim_messages 裁剪历史记录，控制 Token 成本。

掌握 @tool 装饰器定义工具的规范（Docstring 与类型提示）。

理解并手写工具调用的完整闭环（模型选择工具 -> 执行 Python 函数 -> 返回结果给模型）。

1. 对话历史管理 (Memory)

在 LangChain 0.1 之前，大家用 ConversationChain 和 ConversationBufferMemory 比较多。现在官方推荐使用更灵活的 RunnableWithMessageHistory，它完美契合 LCEL 语法。

1.1 状态管理与 Prompt 结合

我们通常需要在后端维护每个用户的聊天记录（Session）。在实际开发中，我们很少直接传消息列表给模型，而是结合 ChatPromptTemplate。

python 复制代码

from langchain_core.chat_history import BaseChatMessageHistory, InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI

# 1. 简单内存存储（生产环境建议用 RedisChatMessageHistory）
store = {} 

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    """根据 session_id 获取历史记录"""
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# 2. 定义带历史占位符的 Prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个友好的 AI 助手。"),
    MessagesPlaceholder(variable_name="chat_history"), # 历史消息会插入这里
    ("human", "{input}")
])

model = ChatOpenAI(model="gpt-4o-mini")
chain = prompt | model

# 3. 包装 Chain，赋予其记忆能力
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",          # 告诉包装器，哪个变量是用户的最新输入
    history_messages_key="chat_history", # 告诉包装器，历史记录应该塞给哪个变量
)

# 4. 测试多轮对话
# 第一轮
response1 = with_message_history.invoke(
    {"input": "你好，我是小明，我最喜欢吃苹果。"},
    config={"configurable": {"session_id": "user_123"}} # 通过 config 传递 session_id
)
print(response1.content)

# 第二轮
response2 = with_message_history.invoke(
    {"input": "我叫什么？我喜欢吃什么？"},
    config={"configurable": {"session_id": "user_123"}}
)
print(response2.content) # 输出: 你叫小明，你最喜欢吃苹果。

1.2 关键技巧：消息裁剪 (Trim Messages)

如果不裁剪历史记录，随着对话变长，Token 很快就会超限（Context Window Limit），且 API 费用会飙升。LangChain 提供了 trim_messages 工具。

python 复制代码

from langchain_core.messages import trim_messages
from langchain_core.runnables import RunnablePassthrough

# 1. 定义裁剪器
trimmer = trim_messages(
    max_tokens=100,            # 保留最近多少 Token（根据模型上下文窗口调整）
    strategy="last",           # 保留策略：最新的消息
    token_counter=model,       # 传入模型，使用其内置的 token 计算方法
    include_system=True,       # 始终保留 SystemMessage（人设不能丢）
    start_on="human",          # 确保裁剪后的记录以 HumanMessage 开头（很多模型有此要求）
)

# 2. 将裁剪器组合进 Chain
# RunnablePassthrough.assign 会在传递给 prompt 之前，先对 chat_history 变量执行 trimmer
chain_with_trimming = (
    RunnablePassthrough.assign(chat_history=trimmer) 
    | prompt 
    | model
)

# 3. 再次包装 History
with_message_history_trimmed = RunnableWithMessageHistory(
    chain_with_trimming,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

2. 工具调用 (Tool Calling)

让 LLM 使用外部工具（如搜索、计算器、查数据库）是构建 Agent 的核心。

2.1 定义工具：文档与类型提示是灵魂

使用 @tool 装饰器是最简便的方法。注意：函数的 Docstring（文档字符串）和参数的类型提示（Type Hints）极其重要！ 模型就是靠阅读这些信息来决定"什么时候调用"以及"怎么传参"的。

python 复制代码

from langchain_core.tools import tool
from pydantic import BaseModel, Field

# 方式一：简单函数（推荐使用 Google 风格的 Docstring）
@tool
def add(a: int, b: int) -> int:
    """将两个整数相加。
    
    Args:
        a: 第一个整数
        b: 第二个整数
    """
    return a + b

# 方式二：使用 Pydantic 定义复杂参数（适合参数多、需要严格校验的场景）
class WeatherInput(BaseModel):
    location: str = Field(description="城市名称，例如：北京、上海")
    unit: str = Field(description="温度单位，可选值：celsius, fahrenheit", default="celsius")

@tool(args_schema=WeatherInput)
def get_weather(location: str, unit: str) -> str:
    """获取指定城市的当前天气情况。"""
    # 这里模拟调用外部 API
    return f"{location} 的天气是晴天，25度 ({unit})。"

print(get_weather.name)        # "get_weather"
print(get_weather.description) # "获取指定城市的当前天气情况。"

2.2 绑定工具 (Bind Tools)

将工具列表传给模型。这并未真正执行工具，只是在 Prompt 底层告诉模型："嘿，你有这些工具可用，如果需要，请按特定格式返回调用请求"。

python 复制代码

tools = [add, get_weather]
llm = ChatOpenAI(model="gpt-4o-mini")

# 绑定工具
llm_with_tools = llm.bind_tools(tools)

# 测试 1：不需要工具的普通对话
res1 = llm_with_tools.invoke("你好")
print(res1.tool_calls) # [] (空列表)

# 测试 2：触发工具调用
res2 = llm_with_tools.invoke("北京今天天气怎么样？")
print(res2.tool_calls) 
# 输出类似: [{'name': 'get_weather', 'args': {'location': '北京', 'unit': 'celsius'}, 'id': 'call_abc123'}]

2.3 手动执行工具 (Tool Execution) - Agent 的底层逻辑

虽然在 LangGraph 等 Agent 框架中，工具执行是自动的，但理解其底层闭环（ReAct 模式）非常重要。

完整的工具调用闭环包含 4 步：

用户提问。
模型决定调用工具 （返回 tool_calls）。
代码解析并执行 Python 函数 ，将结果封装为 ToolMessage。
将 ToolMessage 发回给模型，模型根据工具结果生成最终的自然语言回复。

python 复制代码

from langchain_core.messages import HumanMessage, ToolMessage

# 1. 用户提问
messages = [HumanMessage(content="请帮我算一下 15 加上 27 等于多少？")]

# 2. 模型决定调用工具
ai_msg = llm_with_tools.invoke(messages)
messages.append(ai_msg) # 必须把 AI 的调用请求也存入历史

print(f"模型请求调用工具: {ai_msg.tool_calls}")

# 3. 遍历并执行工具
# 建立一个工具字典方便查找
tool_map = {"add": add, "get_weather": get_weather}

for tool_call in ai_msg.tool_calls:
    # 找到对应的 Python 函数
    selected_tool = tool_map[tool_call["name"].lower()]
    
    # 执行函数 (传入模型提取好的 args)
    tool_result = selected_tool.invoke(tool_call)
    
    # 构造 ToolMessage (必须包含 tool_call_id，以便模型知道这是哪个请求的结果)
    # 注意：tool_result 已经是 ToolMessage 对象了，因为我们调用的是 selected_tool.invoke
    messages.append(tool_result)

print(f"工具执行结果已追加到消息列表。")

# 4. 再次调用模型，获取最终回复
final_response = llm_with_tools.invoke(messages)
print(f"最终回复: {final_response.content}") 
# 输出类似: 15 加上 27 等于 42。

总结

Memory ：使用 RunnableWithMessageHistory 结合 ChatPromptTemplate 是管理多轮对话的标准姿势。别忘了用 trim_messages 控制成本。
Tools ：工具的灵魂在于清晰的描述 (Description) 和准确的类型提示 (Type Hints)。
Execution ：理解了 HumanMessage -> AIMessage(tool_calls) -> ToolMessage -> AIMessage(最终回复) 的流转过程，你就彻底弄懂了 Agent 的底层工作原理。

下一篇，我们将进入 RAG（检索增强生成）实战，学习如何让模型读取你的私有文档。