Day 16:智能上下文管理 - 让 Agent 拥有记忆管理能力

🤖 系列:Java工程师转AI Agent 3个月学习计划

👤 作者:宸丶一 | 28岁Java程序员,正在学习 AI Agent 开发中ing...

🎯 今日目标: 上下文窗口、自动检查点、Token 预算控制

💬 个人格言: 代码改不改变世界我不知道,但先让我准时下班。


前言

Day 15 搞定了子智能体系统,今天来点更实际的 -- 智能上下文管理

说白了就是解决一个问题:AI 的脑子有限,怎么让它记住最重要的东西?

就像你用 Java 写代码,连接池就那么大,不可能无限开连接。上下文窗口也一样,Token 就那么多,塞满了就得想办法扔掉点什么。

今天我们就来学:扔什么、怎么扔、扔之前要不要存档。


学习目标

  1. 搞懂上下文窗口是什么,为什么有限
  2. 学会三种管理策略:FIFO / LRU / 重要性排序
  3. 理解自动检查点系统(存档机制)
  4. 掌握 Token 预算控制和上下文压缩

一、上下文窗口:AI 的短期记忆

先说清楚一个概念:上下文窗口就是 AI 一次能看到多少信息。

你可以把它理解成人的短期记忆 -- 你能同时记住的东西就那么多,超过了就会忘。

用 Java 的话说:上下文窗口 = 数据库连接池。就那么大的池子,连接用完了就得回收,不然新请求就进不来。
#mermaid-svg-cp42BPXr6rApv1tI{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-cp42BPXr6rApv1tI .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-cp42BPXr6rApv1tI .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-cp42BPXr6rApv1tI .error-icon{fill:#552222;}#mermaid-svg-cp42BPXr6rApv1tI .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-cp42BPXr6rApv1tI .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-cp42BPXr6rApv1tI .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-cp42BPXr6rApv1tI .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-cp42BPXr6rApv1tI .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-cp42BPXr6rApv1tI .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-cp42BPXr6rApv1tI .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-cp42BPXr6rApv1tI .marker{fill:#333333;stroke:#333333;}#mermaid-svg-cp42BPXr6rApv1tI .marker.cross{stroke:#333333;}#mermaid-svg-cp42BPXr6rApv1tI svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-cp42BPXr6rApv1tI p{margin:0;}#mermaid-svg-cp42BPXr6rApv1tI .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-cp42BPXr6rApv1tI .cluster-label text{fill:#333;}#mermaid-svg-cp42BPXr6rApv1tI .cluster-label span{color:#333;}#mermaid-svg-cp42BPXr6rApv1tI .cluster-label span p{background-color:transparent;}#mermaid-svg-cp42BPXr6rApv1tI .label text,#mermaid-svg-cp42BPXr6rApv1tI span{fill:#333;color:#333;}#mermaid-svg-cp42BPXr6rApv1tI .node rect,#mermaid-svg-cp42BPXr6rApv1tI .node circle,#mermaid-svg-cp42BPXr6rApv1tI .node ellipse,#mermaid-svg-cp42BPXr6rApv1tI .node polygon,#mermaid-svg-cp42BPXr6rApv1tI .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-cp42BPXr6rApv1tI .rough-node .label text,#mermaid-svg-cp42BPXr6rApv1tI .node .label text,#mermaid-svg-cp42BPXr6rApv1tI .image-shape .label,#mermaid-svg-cp42BPXr6rApv1tI .icon-shape .label{text-anchor:middle;}#mermaid-svg-cp42BPXr6rApv1tI .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-cp42BPXr6rApv1tI .rough-node .label,#mermaid-svg-cp42BPXr6rApv1tI .node .label,#mermaid-svg-cp42BPXr6rApv1tI .image-shape .label,#mermaid-svg-cp42BPXr6rApv1tI .icon-shape .label{text-align:center;}#mermaid-svg-cp42BPXr6rApv1tI .node.clickable{cursor:pointer;}#mermaid-svg-cp42BPXr6rApv1tI .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-cp42BPXr6rApv1tI .arrowheadPath{fill:#333333;}#mermaid-svg-cp42BPXr6rApv1tI .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-cp42BPXr6rApv1tI .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-cp42BPXr6rApv1tI .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-cp42BPXr6rApv1tI .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-cp42BPXr6rApv1tI .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-cp42BPXr6rApv1tI .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-cp42BPXr6rApv1tI .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-cp42BPXr6rApv1tI .cluster text{fill:#333;}#mermaid-svg-cp42BPXr6rApv1tI .cluster span{color:#333;}#mermaid-svg-cp42BPXr6rApv1tI div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-cp42BPXr6rApv1tI .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-cp42BPXr6rApv1tI rect.text{fill:none;stroke-width:0;}#mermaid-svg-cp42BPXr6rApv1tI .icon-shape,#mermaid-svg-cp42BPXr6rApv1tI .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-cp42BPXr6rApv1tI .icon-shape p,#mermaid-svg-cp42BPXr6rApv1tI .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-cp42BPXr6rApv1tI .icon-shape .label rect,#mermaid-svg-cp42BPXr6rApv1tI .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-cp42BPXr6rApv1tI .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-cp42BPXr6rApv1tI .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-cp42BPXr6rApv1tI :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 窗口满了
用户输入
上下文窗口
AI 模型
需要管理策略
保留重要信息
丢弃不重要的

为什么窗口是有限的?

  • Token 越多,计算量越大,成本越高
  • GPU 显存就那么大,塞不下太多
  • 信息太多反而会让 AI 迷糊

二、为什么不能只保留最近的消息?

你可能会想:窗口满了就把旧的删了呗,只留最近的。

但问题是:最近的消息不一定最重要。

看个例子:

复制代码
用户: 我叫宸一,28岁,Java程序员
AI: 你好宸一!
用户: 今天天气怎么样?
AI: 今天晴天,25度
用户: 我叫什么名字?  <- 如果只保留最近消息,AI就忘了!

所以需要智能管理 -- 不是无脑删旧的,而是判断哪些该留、哪些该扔。


三、三种上下文管理策略

这里学了三种策略,各有各的适用场景:

策略 原理 适用场景 Java 类比
FIFO 先进先出,移除最旧的 简单问答,没上下文关联 队列 Queue
LRU 移除最久没被访问的 长时间对话,有间歇 LinkedHashMap
重要性 按优先级移除最不重要的 项目相关,有重点 优先级队列

#mermaid-svg-DT3bHuRKk96vZSAw{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-DT3bHuRKk96vZSAw .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-DT3bHuRKk96vZSAw .error-icon{fill:#552222;}#mermaid-svg-DT3bHuRKk96vZSAw .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-DT3bHuRKk96vZSAw .marker{fill:#333333;stroke:#333333;}#mermaid-svg-DT3bHuRKk96vZSAw .marker.cross{stroke:#333333;}#mermaid-svg-DT3bHuRKk96vZSAw svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-DT3bHuRKk96vZSAw p{margin:0;}#mermaid-svg-DT3bHuRKk96vZSAw .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster-label text{fill:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster-label span{color:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster-label span p{background-color:transparent;}#mermaid-svg-DT3bHuRKk96vZSAw .label text,#mermaid-svg-DT3bHuRKk96vZSAw span{fill:#333;color:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .node rect,#mermaid-svg-DT3bHuRKk96vZSAw .node circle,#mermaid-svg-DT3bHuRKk96vZSAw .node ellipse,#mermaid-svg-DT3bHuRKk96vZSAw .node polygon,#mermaid-svg-DT3bHuRKk96vZSAw .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-DT3bHuRKk96vZSAw .rough-node .label text,#mermaid-svg-DT3bHuRKk96vZSAw .node .label text,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape .label,#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape .label{text-anchor:middle;}#mermaid-svg-DT3bHuRKk96vZSAw .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-DT3bHuRKk96vZSAw .rough-node .label,#mermaid-svg-DT3bHuRKk96vZSAw .node .label,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape .label,#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape .label{text-align:center;}#mermaid-svg-DT3bHuRKk96vZSAw .node.clickable{cursor:pointer;}#mermaid-svg-DT3bHuRKk96vZSAw .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-DT3bHuRKk96vZSAw .arrowheadPath{fill:#333333;}#mermaid-svg-DT3bHuRKk96vZSAw .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-DT3bHuRKk96vZSAw .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-DT3bHuRKk96vZSAw .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-DT3bHuRKk96vZSAw .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-DT3bHuRKk96vZSAw .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-DT3bHuRKk96vZSAw .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-DT3bHuRKk96vZSAw .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster text{fill:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster span{color:#333;}#mermaid-svg-DT3bHuRKk96vZSAw div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-DT3bHuRKk96vZSAw .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-DT3bHuRKk96vZSAw rect.text{fill:none;stroke-width:0;}#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape p,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape .label rect,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-DT3bHuRKk96vZSAw .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-DT3bHuRKk96vZSAw .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-DT3bHuRKk96vZSAw :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 简单问答
长时间对话
项目相关
上下文窗口满了
选择策略
FIFO: 删最旧的
LRU: 删最久没用的
重要性: 删权重最低的
腾出空间,加入新消息

代码实现(重要性策略):

python 复制代码
def importance(window, new_msg):
    # 重要性排序:移除最不重要的消息
    need = len(new_msg) * 2
    def score(m):
        s = 50
        if m["role"] == "system": s += 50  # 系统消息最重要
        if m is window.messages[-1]: s += 30  # 最近的消息更重要
        return s
    while window.current_tokens + need > window.max_tokens and window.messages:
        ranked = sorted(window.messages, key=score)
        removed = ranked[0]  # 移除权重最低的
        window.messages.remove(removed)
        window.current_tokens -= removed["tokens"]
    return window.add_message("user", new_msg)

四、自动检查点系统:存档机制

学完策略,你可能会问:扔掉的消息还能找回来吗?

答案是:不能,但你可以在扔之前存档。 这就是检查点系统。

就像打游戏的存档点 -- 关键时刻自动存一下,翻车了还能读档重来。
#mermaid-svg-EPr3hmRHlcjy0xzY{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-EPr3hmRHlcjy0xzY .error-icon{fill:#552222;}#mermaid-svg-EPr3hmRHlcjy0xzY .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-EPr3hmRHlcjy0xzY .marker{fill:#333333;stroke:#333333;}#mermaid-svg-EPr3hmRHlcjy0xzY .marker.cross{stroke:#333333;}#mermaid-svg-EPr3hmRHlcjy0xzY svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-EPr3hmRHlcjy0xzY p{margin:0;}#mermaid-svg-EPr3hmRHlcjy0xzY .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster-label text{fill:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster-label span{color:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster-label span p{background-color:transparent;}#mermaid-svg-EPr3hmRHlcjy0xzY .label text,#mermaid-svg-EPr3hmRHlcjy0xzY span{fill:#333;color:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .node rect,#mermaid-svg-EPr3hmRHlcjy0xzY .node circle,#mermaid-svg-EPr3hmRHlcjy0xzY .node ellipse,#mermaid-svg-EPr3hmRHlcjy0xzY .node polygon,#mermaid-svg-EPr3hmRHlcjy0xzY .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-EPr3hmRHlcjy0xzY .rough-node .label text,#mermaid-svg-EPr3hmRHlcjy0xzY .node .label text,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape .label,#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape .label{text-anchor:middle;}#mermaid-svg-EPr3hmRHlcjy0xzY .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-EPr3hmRHlcjy0xzY .rough-node .label,#mermaid-svg-EPr3hmRHlcjy0xzY .node .label,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape .label,#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape .label{text-align:center;}#mermaid-svg-EPr3hmRHlcjy0xzY .node.clickable{cursor:pointer;}#mermaid-svg-EPr3hmRHlcjy0xzY .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-EPr3hmRHlcjy0xzY .arrowheadPath{fill:#333333;}#mermaid-svg-EPr3hmRHlcjy0xzY .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-EPr3hmRHlcjy0xzY .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-EPr3hmRHlcjy0xzY .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-EPr3hmRHlcjy0xzY .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-EPr3hmRHlcjy0xzY .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-EPr3hmRHlcjy0xzY .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster text{fill:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster span{color:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-EPr3hmRHlcjy0xzY .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY rect.text{fill:none;stroke-width:0;}#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape p,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape .label rect,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-EPr3hmRHlcjy0xzY .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-EPr3hmRHlcjy0xzY .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-EPr3hmRHlcjy0xzY :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 消息数达到N
话题发生变化
Token使用过高
没触发


对话进行中
触发条件?
自动保存检查点
保存当前状态快照
需要回退?
加载最近的检查点

三种自动触发策略:

触发方式 原理 适用场景
消息计数 每 N 条消息存一次 干活型对话,稳定输出
Token 阈值 Token 用到 80% 就存 资源紧张的场景
话题变化 话题变了就存 项目分析、多主题对话

代码实现:

python 复制代码
class AutoSave:
    @staticmethod
    def on_topic_change(mgr, state, new_topic, old_topic):
        # 话题变了就存档
        if old_topic and new_topic != old_topic:
            mgr.save(state, {"trigger": "话题变化", "topic": new_topic})
            return True
        return False

五、Token 预算控制:把钱花在刀刃上

Token 就是 AI 的钱,你得分配好:系统提示花多少、历史对话花多少、当前输入花多少。
#mermaid-svg-c0ndywR00g0M1mbC{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-c0ndywR00g0M1mbC .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-c0ndywR00g0M1mbC .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-c0ndywR00g0M1mbC .error-icon{fill:#552222;}#mermaid-svg-c0ndywR00g0M1mbC .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-c0ndywR00g0M1mbC .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-c0ndywR00g0M1mbC .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-c0ndywR00g0M1mbC .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-c0ndywR00g0M1mbC .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-c0ndywR00g0M1mbC .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-c0ndywR00g0M1mbC .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-c0ndywR00g0M1mbC .marker{fill:#333333;stroke:#333333;}#mermaid-svg-c0ndywR00g0M1mbC .marker.cross{stroke:#333333;}#mermaid-svg-c0ndywR00g0M1mbC svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-c0ndywR00g0M1mbC p{margin:0;}#mermaid-svg-c0ndywR00g0M1mbC .pieCircle{stroke:#000000;stroke-width:2px;opacity:0.7;}#mermaid-svg-c0ndywR00g0M1mbC .pieOuterCircle{stroke:#000000;stroke-width:1px;fill:none;}#mermaid-svg-c0ndywR00g0M1mbC .pieTitleText{text-anchor:middle;font-size:25px;fill:#000000;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-c0ndywR00g0M1mbC .slice{font-family:"trebuchet ms",verdana,arial,sans-serif;fill:#000000;font-size:17px;}#mermaid-svg-c0ndywR00g0M1mbC .legend text{fill:#000000;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:17px;}#mermaid-svg-c0ndywR00g0M1mbC :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 60%15%15%10%Token 预算分配(总计 4096) 系统提示 10% 历史对话 60% 当前输入 15% 预留响应 15%

为什么这样分?

  • 系统提示 10%:固定的,就那几句话,不需要太多
  • 历史对话 60%:会不断增长,得给够空间
  • 当前输入 15%:用户当前的问题
  • 预留响应 15%:留给 AI 回答的空间

如果反过来呢? 系统提示占 60%,历史只占 10%?那历史存不了几条,AI 回答就会失忆,上下文全丢了。

代码实现:

python 复制代码
@dataclass
class TokenBudget:
    total: int = 4096          # 总预算
    system_pct: float = 0.10   # 系统提示 10%
    history_pct: float = 0.60  # 历史对话 60%
    current_pct: float = 0.15  # 当前输入 15%
    response_pct: float = 0.15 # 预留响应 15%

六、上下文压缩:该扔就扔

预算分好了,但历史对话还是会超。这时候就要压缩 -- 把旧消息变成摘要。
#mermaid-svg-GTYZVVcdglERkHbp{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-GTYZVVcdglERkHbp .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-GTYZVVcdglERkHbp .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-GTYZVVcdglERkHbp .error-icon{fill:#552222;}#mermaid-svg-GTYZVVcdglERkHbp .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-GTYZVVcdglERkHbp .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-GTYZVVcdglERkHbp .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-GTYZVVcdglERkHbp .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-GTYZVVcdglERkHbp .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-GTYZVVcdglERkHbp .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-GTYZVVcdglERkHbp .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-GTYZVVcdglERkHbp .marker{fill:#333333;stroke:#333333;}#mermaid-svg-GTYZVVcdglERkHbp .marker.cross{stroke:#333333;}#mermaid-svg-GTYZVVcdglERkHbp svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-GTYZVVcdglERkHbp p{margin:0;}#mermaid-svg-GTYZVVcdglERkHbp .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-GTYZVVcdglERkHbp .cluster-label text{fill:#333;}#mermaid-svg-GTYZVVcdglERkHbp .cluster-label span{color:#333;}#mermaid-svg-GTYZVVcdglERkHbp .cluster-label span p{background-color:transparent;}#mermaid-svg-GTYZVVcdglERkHbp .label text,#mermaid-svg-GTYZVVcdglERkHbp span{fill:#333;color:#333;}#mermaid-svg-GTYZVVcdglERkHbp .node rect,#mermaid-svg-GTYZVVcdglERkHbp .node circle,#mermaid-svg-GTYZVVcdglERkHbp .node ellipse,#mermaid-svg-GTYZVVcdglERkHbp .node polygon,#mermaid-svg-GTYZVVcdglERkHbp .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-GTYZVVcdglERkHbp .rough-node .label text,#mermaid-svg-GTYZVVcdglERkHbp .node .label text,#mermaid-svg-GTYZVVcdglERkHbp .image-shape .label,#mermaid-svg-GTYZVVcdglERkHbp .icon-shape .label{text-anchor:middle;}#mermaid-svg-GTYZVVcdglERkHbp .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-GTYZVVcdglERkHbp .rough-node .label,#mermaid-svg-GTYZVVcdglERkHbp .node .label,#mermaid-svg-GTYZVVcdglERkHbp .image-shape .label,#mermaid-svg-GTYZVVcdglERkHbp .icon-shape .label{text-align:center;}#mermaid-svg-GTYZVVcdglERkHbp .node.clickable{cursor:pointer;}#mermaid-svg-GTYZVVcdglERkHbp .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-GTYZVVcdglERkHbp .arrowheadPath{fill:#333333;}#mermaid-svg-GTYZVVcdglERkHbp .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-GTYZVVcdglERkHbp .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-GTYZVVcdglERkHbp .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-GTYZVVcdglERkHbp .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-GTYZVVcdglERkHbp .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-GTYZVVcdglERkHbp .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-GTYZVVcdglERkHbp .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-GTYZVVcdglERkHbp .cluster text{fill:#333;}#mermaid-svg-GTYZVVcdglERkHbp .cluster span{color:#333;}#mermaid-svg-GTYZVVcdglERkHbp div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-GTYZVVcdglERkHbp .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-GTYZVVcdglERkHbp rect.text{fill:none;stroke-width:0;}#mermaid-svg-GTYZVVcdglERkHbp .icon-shape,#mermaid-svg-GTYZVVcdglERkHbp .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-GTYZVVcdglERkHbp .icon-shape p,#mermaid-svg-GTYZVVcdglERkHbp .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-GTYZVVcdglERkHbp .icon-shape .label rect,#mermaid-svg-GTYZVVcdglERkHbp .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-GTYZVVcdglERkHbp .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-GTYZVVcdglERkHbp .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-GTYZVVcdglERkHbp :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 没超
超了
原始历史: 20条消息
Token 超预算?
保持原样
分离系统消息
保留最近5条
旧消息压缩为摘要
合并: 系统消息 + 摘要 + 最近5条

压缩逻辑:

  1. 系统消息 -- 绝对保留,不能动
  2. 最近 5 条 -- 保留,最近的最相关
  3. 旧消息 -- 压缩成一句摘要,比如 历史摘要: 学Python; 选资源; 学AI

代码实现:

python 复制代码
class Compressor:
    @staticmethod
    def compress(msgs, max_tok):
        cur = TokenEst.estimate_msgs(msgs)
        if cur <= max_tok:
            return msgs  # 没超,不用压缩
        sys_msgs = [m for m in msgs if m.get("role") == "system"]
        conv = [m for m in msgs if m.get("role") != "system"]
        recent = conv[-5:] if len(conv) > 5 else conv
        old = conv[:-5] if len(conv) > 5 else []
        result = sys_msgs.copy()
        if old:
            topics = set()
            for m in old:
                c = m.get("content", "")
                topics.add(c[:30] + "..." if len(c) > 30 else c)
            result.append({"role": "system", "content": "[历史摘要: " + "; ".join(list(topics)[:3]) + "]"})
        result.extend(recent)
        return result

检查点 vs 备份

这两个概念容易搞混,简单说:

特性 检查点 备份
类型 增量 全量
频率 频繁,关键时刻自动存 不定期,手动触发
大小 轻量,只存当前状态 完整,全量复制
用途 回退到某个状态 灾难恢复
Java 类比 事务 Savepoint 数据库备份

一句话总结:检查点是存档,备份是复制存档文件。


完整流程:智能上下文管理

把上面的概念串起来,整个流程是这样的:
#mermaid-svg-J6KUfXfbmwADat1i{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-J6KUfXfbmwADat1i .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-J6KUfXfbmwADat1i .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-J6KUfXfbmwADat1i .error-icon{fill:#552222;}#mermaid-svg-J6KUfXfbmwADat1i .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-J6KUfXfbmwADat1i .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-J6KUfXfbmwADat1i .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-J6KUfXfbmwADat1i .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-J6KUfXfbmwADat1i .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-J6KUfXfbmwADat1i .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-J6KUfXfbmwADat1i .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-J6KUfXfbmwADat1i .marker{fill:#333333;stroke:#333333;}#mermaid-svg-J6KUfXfbmwADat1i .marker.cross{stroke:#333333;}#mermaid-svg-J6KUfXfbmwADat1i svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-J6KUfXfbmwADat1i p{margin:0;}#mermaid-svg-J6KUfXfbmwADat1i .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-J6KUfXfbmwADat1i .cluster-label text{fill:#333;}#mermaid-svg-J6KUfXfbmwADat1i .cluster-label span{color:#333;}#mermaid-svg-J6KUfXfbmwADat1i .cluster-label span p{background-color:transparent;}#mermaid-svg-J6KUfXfbmwADat1i .label text,#mermaid-svg-J6KUfXfbmwADat1i span{fill:#333;color:#333;}#mermaid-svg-J6KUfXfbmwADat1i .node rect,#mermaid-svg-J6KUfXfbmwADat1i .node circle,#mermaid-svg-J6KUfXfbmwADat1i .node ellipse,#mermaid-svg-J6KUfXfbmwADat1i .node polygon,#mermaid-svg-J6KUfXfbmwADat1i .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-J6KUfXfbmwADat1i .rough-node .label text,#mermaid-svg-J6KUfXfbmwADat1i .node .label text,#mermaid-svg-J6KUfXfbmwADat1i .image-shape .label,#mermaid-svg-J6KUfXfbmwADat1i .icon-shape .label{text-anchor:middle;}#mermaid-svg-J6KUfXfbmwADat1i .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-J6KUfXfbmwADat1i .rough-node .label,#mermaid-svg-J6KUfXfbmwADat1i .node .label,#mermaid-svg-J6KUfXfbmwADat1i .image-shape .label,#mermaid-svg-J6KUfXfbmwADat1i .icon-shape .label{text-align:center;}#mermaid-svg-J6KUfXfbmwADat1i .node.clickable{cursor:pointer;}#mermaid-svg-J6KUfXfbmwADat1i .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-J6KUfXfbmwADat1i .arrowheadPath{fill:#333333;}#mermaid-svg-J6KUfXfbmwADat1i .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-J6KUfXfbmwADat1i .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-J6KUfXfbmwADat1i .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-J6KUfXfbmwADat1i .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-J6KUfXfbmwADat1i .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-J6KUfXfbmwADat1i .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-J6KUfXfbmwADat1i .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-J6KUfXfbmwADat1i .cluster text{fill:#333;}#mermaid-svg-J6KUfXfbmwADat1i .cluster span{color:#333;}#mermaid-svg-J6KUfXfbmwADat1i div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-J6KUfXfbmwADat1i .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-J6KUfXfbmwADat1i rect.text{fill:none;stroke-width:0;}#mermaid-svg-J6KUfXfbmwADat1i .icon-shape,#mermaid-svg-J6KUfXfbmwADat1i .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-J6KUfXfbmwADat1i .icon-shape p,#mermaid-svg-J6KUfXfbmwADat1i .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-J6KUfXfbmwADat1i .icon-shape .label rect,#mermaid-svg-J6KUfXfbmwADat1i .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-J6KUfXfbmwADat1i .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-J6KUfXfbmwADat1i .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-J6KUfXfbmwADat1i :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 没超
超了


用户输入新消息
估算 Token 数
超过历史预算?
直接加入历史
触发压缩
保留系统消息+最近5条
旧消息压缩为摘要
触发检查点?
保存状态快照
拼接完整上下文
发送给 AI 模型


运行效果

跑一下代码看看实际效果:

复制代码
Token 预算分配(总计=200):
  system    :    20 Token(10%)
  history   :   120 Token(60%)
  current   :    30 Token(15%)
  response  :    30 Token(15%)

  [user] 你好,我想学Python
    历史: 17/120 Token, 1条
  [assistant] Python很适合初学者,语法简洁。
    历史: 46/120 Token, 2条
  ...
  [assistant] 先掌握Python基础,再学机器学习。
  [压缩] 258 Token -> 目标 120
  [压缩完成] 258 -> 268 Token
    历史: 268/120 Token, 10条

可以看到:当历史 Token 超过预算时,自动触发压缩,旧消息被合并成摘要。


思考题

  1. 什么是上下文窗口? 用你自己的话解释。
  2. 三种策略对比 -- FIFO、LRU、重要性排序,各自适合什么场景?
  3. 检查点 vs 备份 -- 有什么区别?
  4. Token 是什么? 用最简单的话解释。
  5. 为什么系统提示只分配 10%? 如果反过来分配会怎样?

总结

今天学了 6 个核心概念,用 Java 的话说就是:

概念 Java 类比
上下文窗口 数据库连接池
三种策略 队列 / LinkedHashMap / 优先级队列
检查点 事务 Savepoint
Token 预算 内存分配器
上下文压缩 数据压缩 GZIP
策略模式 可切换的压缩策略

一句话总结:上下文管理 = 在有限资源下做出最优选择。


下一步

Day 17 可能的主题:

  • 多模态上下文:处理图片、音频、视频
  • 跨会话记忆:长期记忆系统
  • 上下文共享:多个智能体共享上下文

🤖 系列:Java工程师转AI Agent 3个月学习计划

👤 作者:宸丶一 | 28岁Java程序员,正在学习 AI Agent 开发中ing...

💬 个人格言: 代码改不改变世界我不知道,但先让我准时下班。

如果觉得有帮助,点个赞再走呗!