Chat Memory你知道怎么用吗

啥是chat memory呢？

我们借用官方文档的描述：

Large language models (LLMs) are stateless, meaning they do not retain information about previous interactions. This can be a limitation when you want to maintain context or state across multiple interactions. To address this, Spring AI provides chat memory features that allow you to store and retrieve information across multiple interactions with the LLM.

大型语言模型（LLM）是无状态的，这意味着它们不会保留你们之间之前交互的信息。如果你希望在多次交互中保持上下文或状态时，就要使用chat memory 机制

如何实现chat memory

我们抛开spring ai 框架来说，如果想要LLM记住我们之间的多轮对话，就需要有个地方去存储对话的记录，我们称之为上下文。因此，我们需要手动维护每次请求的上下文，即 Context，然后把上一次请求过的内容手动加入到下一次请求中，让 LLM 大模型能正确看到此前我们都聊了什么。

在实际应用中，系统会将多轮对话按顺序拼接成一个长文本，作为模型的输入。通常会加入角色标记（如"用户："、"助手："）来帮助模型区分不同发言者。

格式示例：

复制代码

用户：你好
助手：你好！有什么我可以帮你的吗？
用户：我想订一张去上海的机票
助手：好的，您计划什么时候出发？
用户：下周五

模型会基于这个完整的上下文生成下一句回复。

随着 chat 调用次数的不断增多，messages 列表的长度也在不断增加，这意味着每次请求所消耗的 Tokens 数量也在不断增加，并且最终会在某个时间点，messages 中的消息所占用的 Tokens 超过了大模型支持的上下文窗口大小。所以会有某种策略来保持 messages 列表的消息数量在一个可控的范围内，例如，每次只保留最新的 20 条消息作为本次请求的上下文。

也就是说如果我们不使用框架的帮助，需要手动的做很多工作，例如:

手动实现保存会话记录

手动实现检索历史记录

手动实现将历史对话记录拼接到prompt中

手动管理历史记录的窗口，防止窗口无限增长带来的问题

基于spring ai实现的chat memory

依旧借用官方文档的描述：

The ChatMemory abstraction allows you to implement various types of memory to support different use cases. The underlying storage of the messages is handled by the ChatMemoryRepository, whose sole responsibility is to store and retrieve messages. It's up to the ChatMemory implementation to decide which messages to keep and when to remove them. Examples of strategies could include keeping the last N messages, keeping messages for a certain time period, or keeping messages up to a certain token limit.

大致的意思是说：ChatMemory是一个抽象规范，用来实现对message的保留和移除。但是底层依赖于ChatMemoryRepository来实现具体的功能，ChatMemoryRepository的唯一功能就是store and retrieve message

看一下ChatMemory的定义：

arduino 复制代码

public interface ChatMemory {

   String DEFAULT_CONVERSATION_ID = "default";

   /**
    * The key to retrieve the chat memory conversation id from the context.
    */
   String CONVERSATION_ID = "chat_memory_conversation_id";

   /**
    * Save the specified message in the chat memory for the specified conversation.
    */
   default void add(String conversationId, Message message) {
      Assert.hasText(conversationId, "conversationId cannot be null or empty");
      Assert.notNull(message, "message cannot be null");
      this.add(conversationId, List.of(message));
   }

   /**
    * Save the specified messages in the chat memory for the specified conversation.
    */
   void add(String conversationId, List<Message> messages);

   /**
    * Get the messages in the chat memory for the specified conversation.
    */
   List<Message> get(String conversationId);

   /**
    * Clear the chat memory for the specified conversation.
    */
   void clear(String conversationId);

}

从上面的定义我们看出ChatMemory定义了三个接口，它不要求具体的实现，只是定义了规范。

add(String conversationId, List<Message> messages) 给对话添加消息
List<Message> get(String conversationId) 获取某个对话的消息
void clear(String conversationId) 清空对话的消息

spring ai对chatMemory的默认实现

Spring AI auto-configures a ChatMemory bean that you can use directly in your application. By default, it uses an in-memory repository to store messages (InMemoryChatMemoryRepository) and a MessageWindowChatMemory implementation to manage the conversation history. If a different repository is already configured (e.g., Cassandra, JDBC, or Neo4j), Spring AI will use that instead.

less 复制代码

@AutoConfiguration
@ConditionalOnClass({ ChatMemory.class, ChatMemoryRepository.class })
public class ChatMemoryAutoConfiguration {

   @Bean
   @ConditionalOnMissingBean
   ChatMemoryRepository chatMemoryRepository() {
      return new InMemoryChatMemoryRepository();
   }

   @Bean
   @ConditionalOnMissingBean
   ChatMemory chatMemory(ChatMemoryRepository chatMemoryRepository) {
      return MessageWindowChatMemory.builder().chatMemoryRepository(chatMemoryRepository).build();
   }

}

根据官方文档说明spring ai 自动配置了MessageWindowChatMemory，并且它内部默认使用的是InMemoryChatMemoryRepository来管理对话历史。

大模型对话系统中用于管理多轮对话上下文记忆的实现类，名为 MessageWindowChatMemory。它实现了 "滑动窗口式对话记忆"（Message Window）的机制，是实际工程中非常常见且高效的一种方式。

🧩 一、核心目标：解决什么问题？

在大模型多轮对话中，模型的输入受限于 上下文窗口长度（如 8K、32K tokens）。如果对话轮数太多，就会超出限制。

👉 所以需要一种机制：只保留"最近 N 轮"对话，丢弃太早的历史，这就是"消息窗口"的由来。

规则	说明
🟢 新 SystemMessage 加入时	删除所有旧的 `SystemMessage`（只保留最新的系统指令）
🟡 消息超限时淘汰策略	优先淘汰 `UserMessage` 和 `AssistantMessage`，保留 `SystemMessage`

它的整体思路是：

消息总数不能超过窗口的总数。
系统消息特殊处理：新的系统消息要覆盖旧的系统消息，此时如果超过总数，那么淘汰UserMessage 和 AssistantMessage。这么做好处 实现了系统消息的"覆盖"语义：新指令来了，旧指令作废。

举例： • 旧消息中有：SystemMessage("你是客服")

• 新消息中加入：SystemMessage("你是程序员") → 这是一个新的 SystemMessage

spring ai 中如何使用MessageWindowChatMemory

上面我们讲解过，spring ai默认的配置就是MessageWindowChatMemory，可以选择构造器注入的方式

方式1：直接注入

kotlin 复制代码

@RestController
public class ChatWithMemoryController {

    private final ChatClient chatClient;

    private final ChatMemory chatMemory;

    public ChatWithMemoryController(ChatClient.Builder builder,ChatMemory chatMemory) {

        this.chatMemory = chatMemory;
        MessageChatMemoryAdvisor memoryAdvisor = MessageChatMemoryAdvisor.builder(this.chatMemory)
                .build();
        this.chatClient = builder
                .defaultSystem("你是一个非常有经验的知识助手")
                .defaultAdvisors(memoryAdvisor,new SimpleLoggerAdvisor())
                .build();
    }

    @GetMapping(value = "/v2/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ChatResponse> chat(@RequestParam String question) {
        return chatClient.prompt()
                .user( question)
                // 添加参数 会话ID ,先默认1，用于调试
                .advisors(advisorSpec -> advisorSpec.param(CONVERSATION_ID,"1"))
                .stream()
                .chatResponse();
    }

方式2：也可以选择自己构建，设计上大部分场景我们需要自定义，因为默认的聊天记忆是保存在内存中的，这么做第一个问题是宕机会消失，第二个问题是没法做分布式。

ini 复制代码

MessageWindowChatMemory chatMemory = MessageWindowChatMemory.builder()
        .maxMessages(10)
        .build();
this.chatMemory = chatMemory;

MessageChatMemoryAdvisor memoryAdvisor = MessageChatMemoryAdvisor.builder(chatMemory)
        .build();

this.chatClient = builder
        .defaultSystem("你是一个非常有经验的知识助手")
        .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore), memoryAdvisor)
        .build();