LLMOps开发（二） Memory

Memory

memory 在 v0.3版本建议都走 langGragh 所有记忆组件，都是传统链实例化的，并不能很好的适配 LCEL 模式

实现记忆缓存的几种思路
- 缓冲记忆
  - 最基础的记忆模式，将所有 Human/Ai 生成的消息全部存储起来，每次需要使用时将保存的所有聊天消息列表传递到 Prompt 中，通过往用户的输入中添加历史对话信息/记忆，可以让 LLM 能理解之前的对话内容，而且这种记忆方式在上下文窗口限制内是无损的。
- 缓冲窗口记忆
  - 缓冲窗口记忆只保存最近的几次 Human/Ai 生成的消息，它基于缓冲记忆思想，并添加了一个窗口值 k，这意味着只保留一定数量的过去互动，然后"忘记"之前的互动。
- 令牌缓冲记忆
  - 缓冲窗口记忆只保存限定次数 Human/Ai 生成的消息，它基于缓冲记忆思想，并添加了一个令牌数 max_tokens，当聊天历史超过令牌数时，会遗忘之前的互动。
- 摘要总结记忆
  - 除了将消息传递给 LLM，还可以将消息进行总结，每次只传递总结的信息，而不是完整的消息。这种模式记忆对于较长的对话最有用，可以避免过度使用 Token，因为将过去的信息历史以原文的形式保留在提示中会占用太多的 Token。
- 摘要缓冲混合记忆
  - 摘要缓冲混合记忆结合了摘要总结记忆与缓冲窗口记忆，它旨在对对话进行摘要总结，同时保留最近互动中的原始内容，但不是简单地清除旧的交互，而是将它们编译成摘要并同时使用，并且使用标记长度而不是交互数量来确定何时清除交互。
- 向量存储库记忆
  - 将记忆存储在向量存储中，并在每次调用时查询前 K 个最匹配的文档。这类记忆模式能记住所有内容，在细节部分比摘要总结要强，但是比缓冲记忆弱，消耗 Token 方面相对平衡

RunnableWithMessageHistory 包装带有记忆功能

RunnableWithMessageHistory 可以把已经封装好的 Runnable 协议的chain，自动的加上记忆的功能
实现的原理，就是在内部通过传递的运行时配置 session_id 获取到对应的消息历史实例，然后将消息历史实例组装用户输入字典，拼接到原始的 Runnable 可运行链应用中。。为新的 Runnable可运行链应用添加 callback 回调处理器，用于处理存储 LLM 生成的内容，并存储到消息历史记忆中。。
- 也就是监听 chain结束的回调 callbacks，将用户的输入，和 AI 的返回，一起缓存起来

js 复制代码

const prompt = await ChatPromptTemplate.fromMessages([
      {
        role: 'system',
        content: `你是OpenAI开发的聊天机器人，请回答用户的问题，现在的时间是{now}`,
      },
      new MessagesPlaceholder('history'),
      { role: 'user', content: '{query}' },
    ]).partial({ now: () => new Date().toLocaleDateString() });
    
    
    
const withHistoryChain = new RunnableWithMessageHistory({
      runnable: chain,
      getMessageHistory: (sessionId) => {
        return this._getHistory(sessionId);
      },
      historyMessagesKey: 'history',
      inputMessagesKey: 'query',  // 必须指定key，不然模型不知道哪一个是用户的输入，就不能存储
    });

    const res = await withHistoryChain.invoke(
      { query },
      {
        configurable: {
          sessionId,
        },
      },
    );
    console.log('=>(study.memory.service.ts 134) res', res);

使用封装好的内存记忆
InMemoryChatMessageHistory。这是内置的内存记忆，如果自己要实现存储，要实现 BaseListChatMessageHistory 这个类

js 复制代码

 _getHistory(sessionId: string) {
    if (this.store[sessionId]) {
      return this.store[sessionId];
    } else {
      this.store[sessionId] = new InMemoryChatMessageHistory();  
      return this.store[sessionId];
    }
  }

bind 函数 - Runnable组件动态绑定运行时参数

作用
- 1.bind() 函数用于修改 Runnable 底层的默认调用参数，并在调用时会自动传递该参数，无需手动传递，像原始链一样正常调用即可
- 2.解决 RunnableLambda 函数进行包装。但是封装后，所有的 Runnable 组件的 invoke 函数，调用时，只能传递一个参数（类型不限制）

js 复制代码

 async ramadaBind() {
    function get_weather(location, config: any) {
      return `${location}天气为24${config.unit}`;
    }

    const get_weather_runnable = RunnableLambda.from(get_weather).bind({
      unit: '摄氏度',
    });

    const res = await get_weather_runnable.invoke('北京');
    console.log('=>(study.bind.service.ts 55) res', res);
  }

bind 是在 Runnable 调用 invoke，stream ,batch 等方法中，将绑定的配置和之前所有的配置合并，再给执行
- 并不是所有的 Runnable 调用 invoke 都支持 bind 额外传参，得看源码实现

withConfig

构造函数：定义模型参数 temperature
- 或者再运行时临时覆盖

js 复制代码

const model = new ChatOpenAI({
  configuration: { baseURL: "your_api_url" },
  temperature: 0.1, // 默认值
});

// 调用时覆盖 temperature
const response = await model.invoke("Hello", { temperature: 0.5 });

withConfig() 配置运行时行为（如重试、回调、并发控制）,所以 temperature 不能在 withConfig配置

configurable_fields 和 configurable_alternatives （仅python）

1.configurable_fields()：和 bind() 方法接近，但是并不是在构建时传递对应的参数，而是在链运行时为链中的给定步骤指定参数，比 bind() 更灵活。 2.configurable_alternatives()：使用这个方法可以在链运行时，将链中的某一个部分替换成其他替换方案，例如：运行中更换提示模板、更换大语言模型等。

withRetry，withFallbacks，withListeners

重试、回退和监听器的计数函数

js 复制代码

const retry_chain = RunnableLambda.from(count)
      .withRetry({
        stopAfterAttempt: 4,
        onFailedAttempt: (err, attempt) => {
          console.log(`第${attempt}次尝试失败，错误信息为：${err.message}`);
        },
      })
      .withFallbacks([
        RunnableLambda.from(() => {
          console.log('fall back');
        }),
      ])
      .withListeners({
        onStart: () => {
          console.log('开始执行');
        },
        onEnd: () => {
          console.log('执行结束');
        },
        onError: (err) => {
          console.log('执行错误', err);
        },
      });

可以实现自动记忆功能

调用传入 memory

js 复制代码

const res = await retry_chain.invoke(
      { input: 2 },
      {
        configurable: { memory: '这是历史消息' },
      },
    );

RunnableLamada.from 包裹的函数，第二个参数是传入的配置

js 复制代码

function count(x: any, config: any) {
      // if (config?.memory) {
      //   console.log('config?.memory', config?.memory);
      //   // 调用 memory 加载记忆
      //   // const mem = memory.loadMemoryVariables({input:''})
      // }

      // return

      return { history: config.configurable.memory, x };
    }

withListener 监听，在 onEnd 时，存储记忆

js 复制代码

.withListeners({
      onEnd: () => {
        // save_memory() 存储记忆
        console.log('执行结束');
      },
    });