（教学）Agent 构建 Memory（提示词对话存储）3. ConversationTokenBufferMemory(根据“令牌长度”刷新对话内容, 版

（教学）Agent 构建 Memory（提示词对话存储）3. ConversationTokenBufferMemory(根据"令牌长度"刷新对话内容, 版本>1.0和<1.0的区别)

"ConversationTokenBufferMemory"会将近期的对话历史记录存储在缓冲存储器中，并根据"令牌长度"而非对话数量来决定何时刷新对话内容。关键参数：

max_token_limit：设定用于存储对话内容的最大令牌长度
return_messages：当为 True 时，将以聊天格式返回消息。当为 False 时，返回字符串
human_prefix：在人类消息前添加的前缀（默认值："Human"）
ai_prefix：在 AI 消息前添加的前缀（默认值："AI"）

该类，是之前的版本1.0以前的，版本1.0以后的。我会列出两个版本的使用方式和特点。

python 复制代码

from langchain_classic.memory import ConversationTokenBufferMemory
from langchain_openai import ChatOpenAI


from dotenv import load_dotenv
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
load_dotenv()
# 创建ChatOpenAI对象，配置为使用硅基流动API
Qwen2_5_7B_Instruct_llm = ChatOpenAI(
    temperature=0.1,  # 控制输出的随机性和创造性，值越低输出越稳定可预测，值越高输出越有创意但可能偏离预期 (范围: 0.0 ~ 2.0)
    model_name="Qwen/Qwen2.5-7B-Instruct",  # 硅基流动支持的模型名称
    openai_api_key=os.getenv("SILICONFLOW_API_KEY"),  # 从环境变量获取API密钥
    openai_api_base="https://api.siliconflow.cn/v1"  # 硅基流动API的基础URL
)

# Configure memory
memory = ConversationTokenBufferMemory(
    llm=Qwen2_5_7B_Instruct_llm,
    max_token_limit=50,
    return_messages=True,  # Limit maximum token length to 50
)

# 添加任意对话
memory.save_context(
    inputs={
        "human": "您好，我最近从贵公司购买了一台机床。请问如何安装？"
    },
    outputs={
        "ai": "您好！感谢您的购买。请问您能告诉我机器的型号吗？"
    },
)
memory.save_context(
    inputs={"human": "是的，型号是XG-200。"},
    outputs={
        "ai": "谢谢。我来为您介绍XG-200型号的安装指南。首先，请检查安装地点的供电状态。这台机器需要220V电源。"
    },
)
memory.save_context(
    inputs={"human": "我已经检查了电源。下一步该怎么做？"},
    outputs={
        "ai": "很好。接下来，请将机器放置在平整稳固的表面上。然后，按照用户手册的说明进行线缆连接。"
    },
)
memory.save_context(
    inputs={"human": "如何进行连接？"},
    outputs={
        "ai": "请参考说明书第5页。那里有详细的线缆连接说明。如果您在这个过程中遇到任何困难，我很乐意为您提供进一步的帮助。"
    },
)
memory.save_context(
    inputs={"human": "安装完成后我该做什么？"},
    outputs={
        "ai": "安装完成后，请打开电源并进行初始运行测试。测试程序在说明书第10页有详细说明。如果机器有任何问题或者您需要额外支持，请随时与我们联系。"
    },
)
memory.save_context(
    inputs={"human": "谢谢，这些信息对我很有帮助！"},
    outputs={
        "ai": "我们随时准备为您提供帮助。如果您还有任何问题或需要支持，请随时询问。祝您使用愉快！"
    },
)

打印memory中的对话内容

python 复制代码

print(memory.load_memory_variables({})["history"])

$HumanMessage(content='您好，我最近从贵公司购买了一台机床。请问如何安装？', additional_kwargs={}, response_metadata={}), AIMessage(content='您好！感谢您的购买。请问您能告诉我机器的型号吗？', additional_kwargs={}, response_metadata={}), HumanMessage(content='您好，我最近从贵公司购买了一台机床。请问如何安装？', additional_kwargs={}, response_metadata={}), AIMessage(content='您好！感谢您的购买。请问您能告诉我机器的型号吗？', additional_kwargs={}, response_metadata={})$

使用最新版本逻辑实现

python 复制代码

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import MessagesState, START, StateGraph
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
from tiktoken import get_encoding

checkpointer = MemorySaver()
model =  ChatOpenAI(
    temperature=0.1,  # 控制输出的随机性和创造性，值越低输出越稳定可预测，值越高输出越有创意但可能偏离预期 (范围: 0.0 ~ 2.0)
    model_name="Qwen/Qwen2.5-7B-Instruct",  # 硅基流动支持的模型名称
    openai_api_key=os.getenv("SILICONFLOW_API_KEY"),  # 从环境变量获取API密钥
    openai_api_base="https://api.siliconflow.cn/v1"  # 硅基流动API的基础URL
)

tokenizer = get_encoding("cl100k_base")  # tokenizer

def chat_with_token_window(state: MessagesState):
    """基于 token 限制的记忆管理（最大 2000 tokens）"""
    messages = state["messages"]
    max_tokens = 2000
    
    # 计算当前 token 总数
    total_tokens = sum(len(tokenizer.encode(msg.content)) for msg in messages)
    
    # 逐步移除最早消息直到符合 token 限制
    while total_tokens > max_tokens and len(messages) > 1:
        messages.pop(0)
        total_tokens = sum(len(tokenizer.encode(msg.content)) for msg in messages)
    
    response = model.invoke(messages)
    return {"messages": [response]}

# 构建图
builder = StateGraph(state_schema=MessagesState)
builder.add_node("chat", chat_with_token_window)
builder.add_edge(START, "chat")
graph = builder.compile(checkpointer=checkpointer)

config = {"configurable": {"thread_id": "user-1"}}

# 测试长对话（会自动截断）
long_dialogue = [
    "这是很长的第一条消息，包含很多内容..." * 10,
    "第二条也很长..." * 8,
    "第三条消息，问我的名字是什么？"
]

for msg in long_dialogue:
    result = graph.invoke(
        {"messages": [HumanMessage(content=msg)]}, 
        config
    )
    print(f"AI: {result['messages'][-1].content[:100]}...")

高级使用

自定义 token 计算：您可以根据需要自定义 token 计算逻辑，例如考虑特殊字符或非英文文本。
动态调整 token 限制：根据实际对话场景动态调整 token 限制，例如在用户问题复杂时增加限制。
结合其他记忆管理策略：您可以将基于 token 长度的记忆管理与其他策略（如基于时间的记忆管理）结合使用，以获得更完善的对话管理。

python 复制代码

def smart_token_memory(state: MessagesState):
    """Token 窗口 + 智能总结"""
    messages = state["messages"][:]
    max_tokens = 1000
    model = ChatOpenAI(
        temperature=0.1,  # 控制输出的随机性和创造性，值越低输出越稳定可预测，值越高输出越有创意但可能偏离预期 (范围: 0.0 ~ 2.0)
        model_name="Qwen/Qwen2.5-7B-Instruct",  # 硅基流动支持的模型名称
        openai_api_key=os.getenv("SILICONFLOW_API_KEY"),  # 从环境变量获取API密钥
        openai_api_base="https://api.siliconflow.cn/v1"  # 硅基流动API的基础URL
    )
    tokenizer = get_encoding("cl100k_base")
    
    def count_tokens(msgs):
        return sum(len(tokenizer.encode(m.content)) for m in msgs)
    
    # 如果超过限制，先尝试总结早期对话
    while count_tokens(messages) > max_tokens and len(messages) > 2:
        # 总结最早 50% 对话
        mid_point = len(messages) // 2
        early_msgs = messages[:mid_point]
        
        summary_prompt = f"总结以下对话（30字内）：{' '.join(m.content for m in early_msgs)}"
        summary = model.invoke([("user", summary_prompt)])
        
        # 用总结替换早期对话
        messages = [AIMessage(content=f"总结：{summary.content}")] + messages[mid_point:]
    
    # 最后精确截断
    while count_tokens(messages) > max_tokens and len(messages) > 1:
        messages.pop(0)
    
    response = model.invoke(messages)
    return {"messages": [response]}

（ 教学 ）Agent 构建 Memory（提示词对话存储）3. ConversationTokenBufferMemory(根据“令牌长度”刷新对话内容, 版

（ 教学 ）Agent 构建 Memory（提示词对话存储）3. ConversationTokenBufferMemory(根据"令牌长度"刷新对话内容, 版本>1.0和<1.0的区别)

该类，是之前的版本1.0以前的，版本1.0以后的。我会列出两个版本的使用方式和特点。

使用最新版本逻辑实现

高级使用

（教学）Agent 构建 Memory（提示词对话存储）3. ConversationTokenBufferMemory(根据“令牌长度”刷新对话内容, 版

（教学）Agent 构建 Memory（提示词对话存储）3. ConversationTokenBufferMemory(根据"令牌长度"刷新对话内容, 版本>1.0和<1.0的区别)