Agent 记忆管理

LangChain v1.x 对话记忆管理全攻略：从结构化输出到摘要记忆的LCEL最佳实践

本文为个人学习笔记，基于 LangChain 1.2.15 官方标准写法，全面替代已弃用的 LLMChain、ConversationChain 及各类 Memory 类，基于智谱AI GLM-4 实现，覆盖结构化输出、流式输出及4种主流对话记忆管理方案，全程无弃用风险，开箱即用。

前言

LangChain 1.x 版本之后，官方全面推荐使用 LCEL（LangChain Expression Language）管道符写法，传统的 ConversationBufferMemory、LLMChain 等API已逐步进入弃用流程。今天通过完整的代码练习，吃透了结构化输出+流式输出，以及4种企业级对话记忆管理方案，从基础全量记忆到高级摘要缓冲记忆，全部基于LCEL实现，这里整理成可直接复用的笔记。

一、环境准备与LLM初始化

1.1 依赖安装

执行以下命令安装所需依赖，确保版本兼容：

bash 复制代码

pip install langchain==1.2.15 langchain-core langchain-community pydantic tiktoken zhipuai

1.2 智谱AI大模型初始化

这是所有示例的基础，先完成大模型实例的定义，后续所有链都会复用这个 chat 实例：

python 复制代码

import os
from langchain_community.chat_models import ChatZhipuAI

# 填入你的智谱AI API Key
key = "你的智谱AI API Key"
os.environ["ZHIPUAI_API_KEY"] = key

# 初始化大模型，langchain 1.2.15 官方标准写法
chat = ChatZhipuAI(
    model="glm-4",
    temperature=0.5, # 生成文本的随机程度，0-1之间，值越小输出越确定
)

二、结构化输出与流式输出实现

很多场景下我们需要大模型固定输出JSON格式，同时实现流式打字机效果，这里通过 PydanticOutputParser 结合 LCEL 流式调用实现，完全替代老版本的 LLMChain。

2.1 核心依赖与输出结构定义

首先用Pydantic定义严格的输出结构，确保大模型输出符合我们的格式要求：

python 复制代码

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field

# 定义数学问题回答的输出结构
class MathAnswer(BaseModel):
    answer: str = Field(description="数学问题的答案")
    confidence: float = Field(description="答案的置信度，0-1之间")

# 初始化解析器
output_parser = PydanticOutputParser(pydantic_object=MathAnswer)

2.2 提示词模板与LCEL链构建

使用 PromptTemplate 构建提示词，通过LCEL管道符 | 拼接提示词与大模型，这是官方推荐的替代 LLMChain 的写法：

python 复制代码

# 提示词模板，内置格式指令
template = """你是一名数学老师，请用{style}风格回答问题，并以json格式返回答案和置信度：
问题：{question}
{format_instructions}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["style", "question"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

# 【核心】LCEL管道构建链，替代传统LLMChain
chain = prompt | chat

2.3 流式输出调用与结果解析

使用 .stream() 方法实现流式输出，替代老版本的 .run() 方法，实现打字机效果的同时，收集完整的输出结果：

python 复制代码

chunks = []
# 流式调用链
for chunk in chain.stream({"style": "生动有趣", "question": "勾股定理是什么？"}):
    # 实时打印流式内容，实现打字机效果
    print(chunk.content, end='', flush=True)
    chunks.append(chunk)

# 最终完整输出示例
# {
#   "answer": "勾股定理就像一个神奇的数学魔法！它告诉我们，在一个直角三角形中，两条直角边的平方和等于斜边的平方。用公式表示就是：a² + b² = c²，其中a和b是直角边，c是斜边。",
#   "confidence": 1.0
# }

三、对话记忆管理四大核心方案（LCEL官方推荐）

多轮对话的核心是记忆管理，以下4种方案完全替代老版本的各类Memory类，基于LCEL实现，灵活度更高，可扩展性更强。

3.1 方案一：全量对话记忆（替代 ConversationBufferMemory）

3.1.1 核心原理

这是最基础的记忆方案，通过 InMemoryChatMessageHistory 存储完整的对话历史，RunnableWithMessageHistory 实现对话链与记忆的绑定，通过 session_id 区分不同用户/会话，同一个ID会复用同一份记忆。

3.1.2 完整代码实现

python 复制代码

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# ========== 1. 记忆存储：替代 ConversationBufferMemory ==========
# 全局存储，用session_id区分不同会话
chat_history_store = {}

def get_chat_history(session_id: str):
    """获取指定会话的历史记录，实现多轮上下文记忆"""
    if session_id not in chat_history_store:
        chat_history_store[session_id] = InMemoryChatMessageHistory()
    return chat_history_store[session_id]

# ========== 2. 对话提示模板 ==========
prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个友好的AI助手，结合上下文对话内容回复用户"),
    MessagesPlaceholder(variable_name="history"), # 历史对话占位符
    ("human", "{input}")
])

# ========== 3. 构建带记忆的对话链，替代 ConversationChain ==========
base_chain = prompt | chat
conversation_with_memory = RunnableWithMessageHistory(
    runnable=base_chain,
    get_session_history=get_chat_history,
    input_messages_key="input",
    history_messages_key="history",
)

3.1.3 效果测试与说明

python 复制代码

# 第一轮对话
res1 = conversation_with_memory.invoke(
    {"input": "你好，我是一个程序员"},
    config={"configurable": {"session_id": "user_001"}} # 同一个id保留记忆
)
print("第一轮回复：", res1.content)
# 输出：你好！很高兴认识你，程序员朋友！有什么我可以帮助你的吗？...

# 第二轮对话，AI会记住上一轮的内容
res2 = conversation_with_memory.invoke(
    {"input": "你知道我是做什么职业的吗？"},
    config={"configurable": {"session_id": "user_001"}}
)
print("第二轮回复：", res2.content)
# 输出：是的，你刚才告诉我你是一个程序员。不过如果你愿意分享更多...

# 查看完整对话历史
print("\n完整对话历史：") 
print(chat_history_store["user_001"].messages)

笔记提醒：该方案适合短对话场景，优点是完整保留所有上下文，不会丢失信息；缺点是长对话会导致token消耗急剧增加，甚至超出模型上下文窗口限制。

3.2 方案二：滑动窗口记忆（替代 ConversationBufferWindowMemory）

3.2.1 核心原理

通过 trim_messages 工具实现滑动窗口，只保留最近的N轮对话，自动丢弃更早的内容，完美替代老版本的 ConversationBufferWindowMemory(k=3)，解决长对话token膨胀问题。

踩坑记录：之前测试时设置 max_tokens=3 配合 token_counter=len，本意是保留3轮对话，但因为 include_system=True 和 start_on="human" 的配置，导致保留条数不符合预期，下面的代码已修复该问题。

3.2.2 完整代码实现（修复踩坑点）

python 复制代码

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.messages import trim_messages
from operator import itemgetter

# ========== 1. 会话存储 ==========
store = {}
def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# ========== 2. 提示词模板 ==========
prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个友好的助手。"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

# ========== 3. 【核心】滑动窗口逻辑：只保留最近 3 轮对话 ==========
trimmer = trim_messages(
    max_tokens=6,  # 1轮对话=2条消息（human+ai），3轮=6条，这里按消息条数计数
    strategy="last", # 保留最新的消息
    token_counter=len, # 关键：按消息条数计数，而非token数
    include_system=True,
    allow_partial=False, # 不允许截断单轮对话
    start_on="human", # 从human消息开始计数，保证对话轮次完整
)

# ========== 4. 构建LCEL链 ==========
chain = (
    # 第一步：获取历史消息并修剪
    {
        "input": itemgetter("input"),
        "history": itemgetter("history") | trimmer,
    }
    # 第二步：传入提示词
    | prompt
    # 第三步：调用模型
    | chat
)

# ========== 5. 绑定记忆 ==========
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

3.2.3 效果测试与说明

python 复制代码

# 连续发送4轮对话，模型只会记住最近3轮
with_message_history.invoke({"input": "1. 你好"}, config={"configurable": {"session_id": "window_demo"}})
with_message_history.invoke({"input": "2. 我叫小明"}, config={"configurable": {"session_id": "window_demo"}})
with_message_history.invoke({"input": "3. 我喜欢编程"}, config={"configurable": {"session_id": "window_demo"}})
response = with_message_history.invoke({"input": "4. 我叫什么名字？"}, config={"configurable": {"session_id": "window_demo"}})

print(response.content)
# 查看完整记忆存储
print(store["window_demo"].messages)

笔记提醒：该方案适合中等长度的日常对话，优点是实现简单，token消耗可控；缺点是会永久丢失早期对话的关键信息，不适合需要长期记忆的场景。

3.3 方案三：Token限制滑动窗口记忆（替代 ConversationTokenBufferMemory）

3.3.1 核心原理

和按条数的滑动窗口不同，该方案通过 tiktoken 精确计算对话的token数量，严格控制上下文的token上限，比按条数计数更精准，完美适配模型的上下文窗口限制，替代老版本的 ConversationTokenBufferMemory。

3.3.2 完整代码实现

python 复制代码

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.messages import trim_messages
from operator import itemgetter
import tiktoken

# ========== 1. 核心：Token 计数器，精确计算对话token数 ==========
def tiktoken_counter(messages):
    """使用 OpenAI 的 tiktoken 精确计算 Token 数，适配主流大模型"""
    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    num_tokens = 0
    for msg in messages:
        # 每一条消息的固定开销
        num_tokens += 4 
        num_tokens += len(encoding.encode(msg.content))
    num_tokens += 2 # 模型回复的固定开销
    return num_tokens

# ========== 2. 会话存储 ==========
store = {}
def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# ========== 3. 提示词模板 ==========
prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个友好的AI助手。"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

# ========== 4. 【核心】基于 Token 的滑动修剪器 ==========
trimmer = trim_messages(
    max_tokens=300,  # 严格设置上下文Token上限
    strategy="last", # 保留最新的对话内容
    token_counter=tiktoken_counter, # 使用自定义token计数器
    include_system=True,
    allow_partial=False,
    start_on="human",
)

# ========== 5. 构建LCEL链 ==========
chain = (
    {
        "input": itemgetter("input"),
        "history": itemgetter("history") | trimmer, # 历史消息先修剪再传入提示词
    }
    | prompt
    | chat
)

# ========== 6. 绑定记忆 ==========
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

3.3.3 效果测试与说明

python 复制代码

# 多轮对话测试
with_message_history.invoke({"input": "1. 你好"}, config={"configurable": {"session_id": "token_demo"}})
with_message_history.invoke({"input": "2. 我叫小明"}, config={"configurable": {"session_id": "token_demo"}})
with_message_history.invoke({"input": "3. 夏天适合吃什么水果？我爱吃苹果"}, config={"configurable": {"session_id": "token_demo"}})
response = with_message_history.invoke({"input": "4. 我爱吃什么水果？"}, config={"configurable": {"session_id": "token_demo"}})

print(response.content)
# 输出：小明，你之前告诉我你爱吃苹果。苹果是一种非常健康的水果...

笔记提醒：该方案是生产环境最常用的基础方案，优点是精准控制token消耗，完全避免超出模型上下文窗口，适配性极强；缺点依然是会丢失早期对话的关键信息。

3.4 方案四：摘要缓冲记忆（替代 ConversationSummaryBufferMemory）

3.4.1 核心原理

这是最均衡的长对话记忆方案，当对话token数超出设定上限时，不会直接丢弃早期对话，而是通过大模型将早期对话生成摘要，再拼接最新的几轮对话，既控制了token消耗，又不会丢失早期对话的核心信息，完美替代老版本的 ConversationSummaryBufferMemory。

踩坑记录：之前实现时出现报错，原因是 RunnableWithMessageHistory 传入的 history 已经是消息列表（list），不需要再调用 .messages 属性，下面的代码已修复该问题。

3.4.2 完整代码实现（修复报错问题）

python 复制代码

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.messages import SystemMessage
from operator import itemgetter
import tiktoken

# ========== 1. 核心工具1：Token计数器 ==========
def count_tokens(messages):
    """用tiktoken精确计算token数，控制上下文长度"""
    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    num_tokens = 0
    for msg in messages:
        num_tokens += 4  # 每条消息的固定开销
        num_tokens += len(encoding.encode(msg.content))
    num_tokens += 2  # 回复的固定开销
    return num_tokens

# ========== 2. 核心工具2：对话摘要生成器 ==========
summary_prompt = ChatPromptTemplate.from_messages([
    ("system", "请将以下对话历史总结成一段简洁的摘要，保留核心信息，不要丢失关键内容。"),
    ("human", "{chat_history}")
])
summary_chain = summary_prompt | chat

# ========== 3. 会话记忆存储 ==========
store = {}
def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# ========== 4. 【核心】摘要缓冲记忆逻辑 ==========
MAX_TOKEN_LIMIT = 500 # token上限，超出则生成摘要
def process_history(history):
    """
    修复说明：
    这里传入的 history 已经是消息列表 (list)，不需要再调用 .messages
    """
    messages = history
    total_tokens = count_tokens(messages)
    
    # 未超出token限制，直接返回完整历史
    if total_tokens <= MAX_TOKEN_LIMIT:
        return messages
    
    # 超出限制，拆分历史：早期对话生成摘要，保留最新4条消息
    split_index = max(len(messages) - 4, 0)
    early_messages = messages[:split_index]
    latest_messages = messages[split_index:]
    
    # 对早期对话生成摘要
    chat_history_text = "\n".join([f"{msg.type}: {msg.content}" for msg in early_messages])
    summary = summary_chain.invoke({"chat_history": chat_history_text})
    
    # 返回摘要+最新对话，既保留核心信息，又控制token长度
    return [SystemMessage(content=f"之前对话的摘要：{summary.content}")] + latest_messages

# ========== 5. 对话主提示词 ==========
main_prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个友好的AI助手，结合对话历史回复用户的问题。"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

# ========== 6. 构建LCEL链 ==========
base_chain = (
    RunnablePassthrough.assign(
        history=itemgetter("history") | RunnableLambda(process_history)
    )
    | main_prompt
    | chat
)

# 绑定记忆能力
conversation_with_summary_memory = RunnableWithMessageHistory(
    runnable=base_chain,
    get_session_history=get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

3.4.3 效果测试与说明

python 复制代码

print("--- 第1轮对话 ---")
r1 = conversation_with_summary_memory.invoke(
    {"input": "你好，我是一个程序员，最近在学习LangChain框架"},
    config={"configurable": {"session_id": "summary_demo"}}
)
print(r1.content)

print("\n--- 第2轮对话 ---")
r2 = conversation_with_summary_memory.invoke(
    {"input": "我主要用Python做AI应用开发"},
    config={"configurable": {"session_id": "summary_demo"}}
)
print(r2.content)

print("\n--- 第3轮对话 ---")
r3 = conversation_with_summary_memory.invoke(
    {"input": "我现在在处理多轮对话的记忆管理问题"},
    config={"configurable": {"session_id": "summary_demo"}}
)
print(r3.content)

print("\n--- 最终测试 ---")
final_response = conversation_with_summary_memory.invoke(
    {"input": "我是做什么的？最近在学习什么？"},
    config={"configurable": {"session_id": "summary_demo"}}
)
print(final_response.content)
# 输出：根据我们之前的对话，你是一名程序员，主要使用Python进行AI应用开发，最近正在学习LangChain框架，重点研究多轮对话的记忆管理问题。

笔记提醒：该方案是长对话场景的首选，优点是平衡了上下文完整性和token消耗，不会丢失核心信息，适合客服、智能助手等需要长期对话的生产场景；缺点是会额外消耗token生成摘要，对提示词的摘要效果有一定要求。

四、四大记忆方案选型对比

记忆方案	核心优势	核心劣势	适用场景
全量对话记忆	完整保留所有上下文，无信息丢失	长对话token爆炸，易超出模型窗口	短对话、问答场景、测试调试
滑动窗口记忆	实现简单，token消耗可控	永久丢失早期对话关键信息	日常闲聊、中等长度对话、对长期记忆无要求的场景
Token限制滑动窗口	精准控制token，适配模型窗口，稳定性极强	依然会丢失早期对话信息	生产环境基础对话、绝大多数通用对话场景
摘要缓冲记忆	平衡信息完整性与token消耗，不丢失核心信息	额外消耗token生成摘要，实现稍复杂	长对话场景、客服系统、智能助手、需要长期记忆的生产场景

五、总结与最佳实践

LCEL是未来趋势：LangChain 1.x 版本之后，LCEL管道写法已经完全替代了老版本的链式调用，不仅写法更简洁，而且可组合性、可扩展性更强，官方长期维护，无弃用风险。
结构化输出必用Pydantic ：PydanticOutputParser 可以严格约束大模型的输出格式，避免JSON解析报错，是企业级开发的必用方案。
记忆方案选型原则：测试调试用全量记忆，简单场景用token滑动窗口，长对话生产场景优先用摘要缓冲记忆。
流式输出核心 ：通过 .stream() 方法可以轻松实现流式打字机效果，提升用户体验，收集chunk的同时可以实时解析内容，适配前端展示。

以上就是今天完整的学习笔记，所有代码都经过实测可直接运行，基于官方标准写法，避免了各种弃用API和踩坑点，可直接用于学习和生产开发。