🤖 系列:Java工程师转AI Agent 3个月学习计划
👤 作者:宸丶一 | 28岁Java程序员,正在学习 AI Agent 开发中ing...
🎯 今日目标: 上下文窗口、自动检查点、Token 预算控制
💬 个人格言: 代码改不改变世界我不知道,但先让我准时下班。
前言
Day 15 搞定了子智能体系统,今天来点更实际的 -- 智能上下文管理。
说白了就是解决一个问题:AI 的脑子有限,怎么让它记住最重要的东西?
就像你用 Java 写代码,连接池就那么大,不可能无限开连接。上下文窗口也一样,Token 就那么多,塞满了就得想办法扔掉点什么。
今天我们就来学:扔什么、怎么扔、扔之前要不要存档。
学习目标
- 搞懂上下文窗口是什么,为什么有限
- 学会三种管理策略:FIFO / LRU / 重要性排序
- 理解自动检查点系统(存档机制)
- 掌握 Token 预算控制和上下文压缩
一、上下文窗口:AI 的短期记忆
先说清楚一个概念:上下文窗口就是 AI 一次能看到多少信息。
你可以把它理解成人的短期记忆 -- 你能同时记住的东西就那么多,超过了就会忘。
用 Java 的话说:上下文窗口 = 数据库连接池。就那么大的池子,连接用完了就得回收,不然新请求就进不来。
#mermaid-svg-cp42BPXr6rApv1tI{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-cp42BPXr6rApv1tI .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-cp42BPXr6rApv1tI .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-cp42BPXr6rApv1tI .error-icon{fill:#552222;}#mermaid-svg-cp42BPXr6rApv1tI .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-cp42BPXr6rApv1tI .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-cp42BPXr6rApv1tI .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-cp42BPXr6rApv1tI .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-cp42BPXr6rApv1tI .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-cp42BPXr6rApv1tI .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-cp42BPXr6rApv1tI .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-cp42BPXr6rApv1tI .marker{fill:#333333;stroke:#333333;}#mermaid-svg-cp42BPXr6rApv1tI .marker.cross{stroke:#333333;}#mermaid-svg-cp42BPXr6rApv1tI svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-cp42BPXr6rApv1tI p{margin:0;}#mermaid-svg-cp42BPXr6rApv1tI .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-cp42BPXr6rApv1tI .cluster-label text{fill:#333;}#mermaid-svg-cp42BPXr6rApv1tI .cluster-label span{color:#333;}#mermaid-svg-cp42BPXr6rApv1tI .cluster-label span p{background-color:transparent;}#mermaid-svg-cp42BPXr6rApv1tI .label text,#mermaid-svg-cp42BPXr6rApv1tI span{fill:#333;color:#333;}#mermaid-svg-cp42BPXr6rApv1tI .node rect,#mermaid-svg-cp42BPXr6rApv1tI .node circle,#mermaid-svg-cp42BPXr6rApv1tI .node ellipse,#mermaid-svg-cp42BPXr6rApv1tI .node polygon,#mermaid-svg-cp42BPXr6rApv1tI .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-cp42BPXr6rApv1tI .rough-node .label text,#mermaid-svg-cp42BPXr6rApv1tI .node .label text,#mermaid-svg-cp42BPXr6rApv1tI .image-shape .label,#mermaid-svg-cp42BPXr6rApv1tI .icon-shape .label{text-anchor:middle;}#mermaid-svg-cp42BPXr6rApv1tI .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-cp42BPXr6rApv1tI .rough-node .label,#mermaid-svg-cp42BPXr6rApv1tI .node .label,#mermaid-svg-cp42BPXr6rApv1tI .image-shape .label,#mermaid-svg-cp42BPXr6rApv1tI .icon-shape .label{text-align:center;}#mermaid-svg-cp42BPXr6rApv1tI .node.clickable{cursor:pointer;}#mermaid-svg-cp42BPXr6rApv1tI .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-cp42BPXr6rApv1tI .arrowheadPath{fill:#333333;}#mermaid-svg-cp42BPXr6rApv1tI .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-cp42BPXr6rApv1tI .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-cp42BPXr6rApv1tI .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-cp42BPXr6rApv1tI .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-cp42BPXr6rApv1tI .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-cp42BPXr6rApv1tI .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-cp42BPXr6rApv1tI .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-cp42BPXr6rApv1tI .cluster text{fill:#333;}#mermaid-svg-cp42BPXr6rApv1tI .cluster span{color:#333;}#mermaid-svg-cp42BPXr6rApv1tI div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-cp42BPXr6rApv1tI .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-cp42BPXr6rApv1tI rect.text{fill:none;stroke-width:0;}#mermaid-svg-cp42BPXr6rApv1tI .icon-shape,#mermaid-svg-cp42BPXr6rApv1tI .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-cp42BPXr6rApv1tI .icon-shape p,#mermaid-svg-cp42BPXr6rApv1tI .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-cp42BPXr6rApv1tI .icon-shape .label rect,#mermaid-svg-cp42BPXr6rApv1tI .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-cp42BPXr6rApv1tI .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-cp42BPXr6rApv1tI .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-cp42BPXr6rApv1tI :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 窗口满了
用户输入
上下文窗口
AI 模型
需要管理策略
保留重要信息
丢弃不重要的
为什么窗口是有限的?
- Token 越多,计算量越大,成本越高
- GPU 显存就那么大,塞不下太多
- 信息太多反而会让 AI 迷糊
二、为什么不能只保留最近的消息?
你可能会想:窗口满了就把旧的删了呗,只留最近的。
但问题是:最近的消息不一定最重要。
看个例子:
用户: 我叫宸一,28岁,Java程序员
AI: 你好宸一!
用户: 今天天气怎么样?
AI: 今天晴天,25度
用户: 我叫什么名字? <- 如果只保留最近消息,AI就忘了!
所以需要智能管理 -- 不是无脑删旧的,而是判断哪些该留、哪些该扔。
三、三种上下文管理策略
这里学了三种策略,各有各的适用场景:
| 策略 | 原理 | 适用场景 | Java 类比 |
|---|---|---|---|
| FIFO | 先进先出,移除最旧的 | 简单问答,没上下文关联 | 队列 Queue |
| LRU | 移除最久没被访问的 | 长时间对话,有间歇 | LinkedHashMap |
| 重要性 | 按优先级移除最不重要的 | 项目相关,有重点 | 优先级队列 |
#mermaid-svg-DT3bHuRKk96vZSAw{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-DT3bHuRKk96vZSAw .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-DT3bHuRKk96vZSAw .error-icon{fill:#552222;}#mermaid-svg-DT3bHuRKk96vZSAw .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-DT3bHuRKk96vZSAw .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-DT3bHuRKk96vZSAw .marker{fill:#333333;stroke:#333333;}#mermaid-svg-DT3bHuRKk96vZSAw .marker.cross{stroke:#333333;}#mermaid-svg-DT3bHuRKk96vZSAw svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-DT3bHuRKk96vZSAw p{margin:0;}#mermaid-svg-DT3bHuRKk96vZSAw .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster-label text{fill:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster-label span{color:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster-label span p{background-color:transparent;}#mermaid-svg-DT3bHuRKk96vZSAw .label text,#mermaid-svg-DT3bHuRKk96vZSAw span{fill:#333;color:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .node rect,#mermaid-svg-DT3bHuRKk96vZSAw .node circle,#mermaid-svg-DT3bHuRKk96vZSAw .node ellipse,#mermaid-svg-DT3bHuRKk96vZSAw .node polygon,#mermaid-svg-DT3bHuRKk96vZSAw .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-DT3bHuRKk96vZSAw .rough-node .label text,#mermaid-svg-DT3bHuRKk96vZSAw .node .label text,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape .label,#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape .label{text-anchor:middle;}#mermaid-svg-DT3bHuRKk96vZSAw .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-DT3bHuRKk96vZSAw .rough-node .label,#mermaid-svg-DT3bHuRKk96vZSAw .node .label,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape .label,#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape .label{text-align:center;}#mermaid-svg-DT3bHuRKk96vZSAw .node.clickable{cursor:pointer;}#mermaid-svg-DT3bHuRKk96vZSAw .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-DT3bHuRKk96vZSAw .arrowheadPath{fill:#333333;}#mermaid-svg-DT3bHuRKk96vZSAw .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-DT3bHuRKk96vZSAw .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-DT3bHuRKk96vZSAw .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-DT3bHuRKk96vZSAw .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-DT3bHuRKk96vZSAw .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-DT3bHuRKk96vZSAw .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-DT3bHuRKk96vZSAw .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster text{fill:#333;}#mermaid-svg-DT3bHuRKk96vZSAw .cluster span{color:#333;}#mermaid-svg-DT3bHuRKk96vZSAw div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-DT3bHuRKk96vZSAw .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-DT3bHuRKk96vZSAw rect.text{fill:none;stroke-width:0;}#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape p,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-DT3bHuRKk96vZSAw .icon-shape .label rect,#mermaid-svg-DT3bHuRKk96vZSAw .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-DT3bHuRKk96vZSAw .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-DT3bHuRKk96vZSAw .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-DT3bHuRKk96vZSAw :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 简单问答
长时间对话
项目相关
上下文窗口满了
选择策略
FIFO: 删最旧的
LRU: 删最久没用的
重要性: 删权重最低的
腾出空间,加入新消息
代码实现(重要性策略):
python
def importance(window, new_msg):
# 重要性排序:移除最不重要的消息
need = len(new_msg) * 2
def score(m):
s = 50
if m["role"] == "system": s += 50 # 系统消息最重要
if m is window.messages[-1]: s += 30 # 最近的消息更重要
return s
while window.current_tokens + need > window.max_tokens and window.messages:
ranked = sorted(window.messages, key=score)
removed = ranked[0] # 移除权重最低的
window.messages.remove(removed)
window.current_tokens -= removed["tokens"]
return window.add_message("user", new_msg)
四、自动检查点系统:存档机制
学完策略,你可能会问:扔掉的消息还能找回来吗?
答案是:不能,但你可以在扔之前存档。 这就是检查点系统。
就像打游戏的存档点 -- 关键时刻自动存一下,翻车了还能读档重来。
#mermaid-svg-EPr3hmRHlcjy0xzY{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-EPr3hmRHlcjy0xzY .error-icon{fill:#552222;}#mermaid-svg-EPr3hmRHlcjy0xzY .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-EPr3hmRHlcjy0xzY .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-EPr3hmRHlcjy0xzY .marker{fill:#333333;stroke:#333333;}#mermaid-svg-EPr3hmRHlcjy0xzY .marker.cross{stroke:#333333;}#mermaid-svg-EPr3hmRHlcjy0xzY svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-EPr3hmRHlcjy0xzY p{margin:0;}#mermaid-svg-EPr3hmRHlcjy0xzY .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster-label text{fill:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster-label span{color:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster-label span p{background-color:transparent;}#mermaid-svg-EPr3hmRHlcjy0xzY .label text,#mermaid-svg-EPr3hmRHlcjy0xzY span{fill:#333;color:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .node rect,#mermaid-svg-EPr3hmRHlcjy0xzY .node circle,#mermaid-svg-EPr3hmRHlcjy0xzY .node ellipse,#mermaid-svg-EPr3hmRHlcjy0xzY .node polygon,#mermaid-svg-EPr3hmRHlcjy0xzY .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-EPr3hmRHlcjy0xzY .rough-node .label text,#mermaid-svg-EPr3hmRHlcjy0xzY .node .label text,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape .label,#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape .label{text-anchor:middle;}#mermaid-svg-EPr3hmRHlcjy0xzY .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-EPr3hmRHlcjy0xzY .rough-node .label,#mermaid-svg-EPr3hmRHlcjy0xzY .node .label,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape .label,#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape .label{text-align:center;}#mermaid-svg-EPr3hmRHlcjy0xzY .node.clickable{cursor:pointer;}#mermaid-svg-EPr3hmRHlcjy0xzY .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-EPr3hmRHlcjy0xzY .arrowheadPath{fill:#333333;}#mermaid-svg-EPr3hmRHlcjy0xzY .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-EPr3hmRHlcjy0xzY .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-EPr3hmRHlcjy0xzY .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-EPr3hmRHlcjy0xzY .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-EPr3hmRHlcjy0xzY .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-EPr3hmRHlcjy0xzY .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster text{fill:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY .cluster span{color:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-EPr3hmRHlcjy0xzY .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-EPr3hmRHlcjy0xzY rect.text{fill:none;stroke-width:0;}#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape p,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-EPr3hmRHlcjy0xzY .icon-shape .label rect,#mermaid-svg-EPr3hmRHlcjy0xzY .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-EPr3hmRHlcjy0xzY .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-EPr3hmRHlcjy0xzY .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-EPr3hmRHlcjy0xzY :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 消息数达到N
话题发生变化
Token使用过高
没触发
是
否
对话进行中
触发条件?
自动保存检查点
保存当前状态快照
需要回退?
加载最近的检查点
三种自动触发策略:
| 触发方式 | 原理 | 适用场景 |
|---|---|---|
| 消息计数 | 每 N 条消息存一次 | 干活型对话,稳定输出 |
| Token 阈值 | Token 用到 80% 就存 | 资源紧张的场景 |
| 话题变化 | 话题变了就存 | 项目分析、多主题对话 |
代码实现:
python
class AutoSave:
@staticmethod
def on_topic_change(mgr, state, new_topic, old_topic):
# 话题变了就存档
if old_topic and new_topic != old_topic:
mgr.save(state, {"trigger": "话题变化", "topic": new_topic})
return True
return False
五、Token 预算控制:把钱花在刀刃上
Token 就是 AI 的钱,你得分配好:系统提示花多少、历史对话花多少、当前输入花多少。
#mermaid-svg-c0ndywR00g0M1mbC{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-c0ndywR00g0M1mbC .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-c0ndywR00g0M1mbC .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-c0ndywR00g0M1mbC .error-icon{fill:#552222;}#mermaid-svg-c0ndywR00g0M1mbC .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-c0ndywR00g0M1mbC .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-c0ndywR00g0M1mbC .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-c0ndywR00g0M1mbC .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-c0ndywR00g0M1mbC .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-c0ndywR00g0M1mbC .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-c0ndywR00g0M1mbC .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-c0ndywR00g0M1mbC .marker{fill:#333333;stroke:#333333;}#mermaid-svg-c0ndywR00g0M1mbC .marker.cross{stroke:#333333;}#mermaid-svg-c0ndywR00g0M1mbC svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-c0ndywR00g0M1mbC p{margin:0;}#mermaid-svg-c0ndywR00g0M1mbC .pieCircle{stroke:#000000;stroke-width:2px;opacity:0.7;}#mermaid-svg-c0ndywR00g0M1mbC .pieOuterCircle{stroke:#000000;stroke-width:1px;fill:none;}#mermaid-svg-c0ndywR00g0M1mbC .pieTitleText{text-anchor:middle;font-size:25px;fill:#000000;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-c0ndywR00g0M1mbC .slice{font-family:"trebuchet ms",verdana,arial,sans-serif;fill:#000000;font-size:17px;}#mermaid-svg-c0ndywR00g0M1mbC .legend text{fill:#000000;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:17px;}#mermaid-svg-c0ndywR00g0M1mbC :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 60%15%15%10%Token 预算分配(总计 4096) 系统提示 10% 历史对话 60% 当前输入 15% 预留响应 15%
为什么这样分?
- 系统提示 10%:固定的,就那几句话,不需要太多
- 历史对话 60%:会不断增长,得给够空间
- 当前输入 15%:用户当前的问题
- 预留响应 15%:留给 AI 回答的空间
如果反过来呢? 系统提示占 60%,历史只占 10%?那历史存不了几条,AI 回答就会失忆,上下文全丢了。
代码实现:
python
@dataclass
class TokenBudget:
total: int = 4096 # 总预算
system_pct: float = 0.10 # 系统提示 10%
history_pct: float = 0.60 # 历史对话 60%
current_pct: float = 0.15 # 当前输入 15%
response_pct: float = 0.15 # 预留响应 15%
六、上下文压缩:该扔就扔
预算分好了,但历史对话还是会超。这时候就要压缩 -- 把旧消息变成摘要。
#mermaid-svg-GTYZVVcdglERkHbp{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-GTYZVVcdglERkHbp .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-GTYZVVcdglERkHbp .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-GTYZVVcdglERkHbp .error-icon{fill:#552222;}#mermaid-svg-GTYZVVcdglERkHbp .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-GTYZVVcdglERkHbp .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-GTYZVVcdglERkHbp .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-GTYZVVcdglERkHbp .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-GTYZVVcdglERkHbp .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-GTYZVVcdglERkHbp .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-GTYZVVcdglERkHbp .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-GTYZVVcdglERkHbp .marker{fill:#333333;stroke:#333333;}#mermaid-svg-GTYZVVcdglERkHbp .marker.cross{stroke:#333333;}#mermaid-svg-GTYZVVcdglERkHbp svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-GTYZVVcdglERkHbp p{margin:0;}#mermaid-svg-GTYZVVcdglERkHbp .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-GTYZVVcdglERkHbp .cluster-label text{fill:#333;}#mermaid-svg-GTYZVVcdglERkHbp .cluster-label span{color:#333;}#mermaid-svg-GTYZVVcdglERkHbp .cluster-label span p{background-color:transparent;}#mermaid-svg-GTYZVVcdglERkHbp .label text,#mermaid-svg-GTYZVVcdglERkHbp span{fill:#333;color:#333;}#mermaid-svg-GTYZVVcdglERkHbp .node rect,#mermaid-svg-GTYZVVcdglERkHbp .node circle,#mermaid-svg-GTYZVVcdglERkHbp .node ellipse,#mermaid-svg-GTYZVVcdglERkHbp .node polygon,#mermaid-svg-GTYZVVcdglERkHbp .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-GTYZVVcdglERkHbp .rough-node .label text,#mermaid-svg-GTYZVVcdglERkHbp .node .label text,#mermaid-svg-GTYZVVcdglERkHbp .image-shape .label,#mermaid-svg-GTYZVVcdglERkHbp .icon-shape .label{text-anchor:middle;}#mermaid-svg-GTYZVVcdglERkHbp .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-GTYZVVcdglERkHbp .rough-node .label,#mermaid-svg-GTYZVVcdglERkHbp .node .label,#mermaid-svg-GTYZVVcdglERkHbp .image-shape .label,#mermaid-svg-GTYZVVcdglERkHbp .icon-shape .label{text-align:center;}#mermaid-svg-GTYZVVcdglERkHbp .node.clickable{cursor:pointer;}#mermaid-svg-GTYZVVcdglERkHbp .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-GTYZVVcdglERkHbp .arrowheadPath{fill:#333333;}#mermaid-svg-GTYZVVcdglERkHbp .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-GTYZVVcdglERkHbp .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-GTYZVVcdglERkHbp .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-GTYZVVcdglERkHbp .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-GTYZVVcdglERkHbp .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-GTYZVVcdglERkHbp .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-GTYZVVcdglERkHbp .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-GTYZVVcdglERkHbp .cluster text{fill:#333;}#mermaid-svg-GTYZVVcdglERkHbp .cluster span{color:#333;}#mermaid-svg-GTYZVVcdglERkHbp div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-GTYZVVcdglERkHbp .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-GTYZVVcdglERkHbp rect.text{fill:none;stroke-width:0;}#mermaid-svg-GTYZVVcdglERkHbp .icon-shape,#mermaid-svg-GTYZVVcdglERkHbp .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-GTYZVVcdglERkHbp .icon-shape p,#mermaid-svg-GTYZVVcdglERkHbp .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-GTYZVVcdglERkHbp .icon-shape .label rect,#mermaid-svg-GTYZVVcdglERkHbp .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-GTYZVVcdglERkHbp .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-GTYZVVcdglERkHbp .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-GTYZVVcdglERkHbp :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 没超
超了
原始历史: 20条消息
Token 超预算?
保持原样
分离系统消息
保留最近5条
旧消息压缩为摘要
合并: 系统消息 + 摘要 + 最近5条
压缩逻辑:
- 系统消息 -- 绝对保留,不能动
- 最近 5 条 -- 保留,最近的最相关
- 旧消息 -- 压缩成一句摘要,比如 历史摘要: 学Python; 选资源; 学AI
代码实现:
python
class Compressor:
@staticmethod
def compress(msgs, max_tok):
cur = TokenEst.estimate_msgs(msgs)
if cur <= max_tok:
return msgs # 没超,不用压缩
sys_msgs = [m for m in msgs if m.get("role") == "system"]
conv = [m for m in msgs if m.get("role") != "system"]
recent = conv[-5:] if len(conv) > 5 else conv
old = conv[:-5] if len(conv) > 5 else []
result = sys_msgs.copy()
if old:
topics = set()
for m in old:
c = m.get("content", "")
topics.add(c[:30] + "..." if len(c) > 30 else c)
result.append({"role": "system", "content": "[历史摘要: " + "; ".join(list(topics)[:3]) + "]"})
result.extend(recent)
return result
检查点 vs 备份
这两个概念容易搞混,简单说:
| 特性 | 检查点 | 备份 |
|---|---|---|
| 类型 | 增量 | 全量 |
| 频率 | 频繁,关键时刻自动存 | 不定期,手动触发 |
| 大小 | 轻量,只存当前状态 | 完整,全量复制 |
| 用途 | 回退到某个状态 | 灾难恢复 |
| Java 类比 | 事务 Savepoint | 数据库备份 |
一句话总结:检查点是存档,备份是复制存档文件。
完整流程:智能上下文管理
把上面的概念串起来,整个流程是这样的:
#mermaid-svg-J6KUfXfbmwADat1i{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-J6KUfXfbmwADat1i .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-J6KUfXfbmwADat1i .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-J6KUfXfbmwADat1i .error-icon{fill:#552222;}#mermaid-svg-J6KUfXfbmwADat1i .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-J6KUfXfbmwADat1i .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-J6KUfXfbmwADat1i .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-J6KUfXfbmwADat1i .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-J6KUfXfbmwADat1i .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-J6KUfXfbmwADat1i .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-J6KUfXfbmwADat1i .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-J6KUfXfbmwADat1i .marker{fill:#333333;stroke:#333333;}#mermaid-svg-J6KUfXfbmwADat1i .marker.cross{stroke:#333333;}#mermaid-svg-J6KUfXfbmwADat1i svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-J6KUfXfbmwADat1i p{margin:0;}#mermaid-svg-J6KUfXfbmwADat1i .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-J6KUfXfbmwADat1i .cluster-label text{fill:#333;}#mermaid-svg-J6KUfXfbmwADat1i .cluster-label span{color:#333;}#mermaid-svg-J6KUfXfbmwADat1i .cluster-label span p{background-color:transparent;}#mermaid-svg-J6KUfXfbmwADat1i .label text,#mermaid-svg-J6KUfXfbmwADat1i span{fill:#333;color:#333;}#mermaid-svg-J6KUfXfbmwADat1i .node rect,#mermaid-svg-J6KUfXfbmwADat1i .node circle,#mermaid-svg-J6KUfXfbmwADat1i .node ellipse,#mermaid-svg-J6KUfXfbmwADat1i .node polygon,#mermaid-svg-J6KUfXfbmwADat1i .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-J6KUfXfbmwADat1i .rough-node .label text,#mermaid-svg-J6KUfXfbmwADat1i .node .label text,#mermaid-svg-J6KUfXfbmwADat1i .image-shape .label,#mermaid-svg-J6KUfXfbmwADat1i .icon-shape .label{text-anchor:middle;}#mermaid-svg-J6KUfXfbmwADat1i .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-J6KUfXfbmwADat1i .rough-node .label,#mermaid-svg-J6KUfXfbmwADat1i .node .label,#mermaid-svg-J6KUfXfbmwADat1i .image-shape .label,#mermaid-svg-J6KUfXfbmwADat1i .icon-shape .label{text-align:center;}#mermaid-svg-J6KUfXfbmwADat1i .node.clickable{cursor:pointer;}#mermaid-svg-J6KUfXfbmwADat1i .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-J6KUfXfbmwADat1i .arrowheadPath{fill:#333333;}#mermaid-svg-J6KUfXfbmwADat1i .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-J6KUfXfbmwADat1i .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-J6KUfXfbmwADat1i .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-J6KUfXfbmwADat1i .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-J6KUfXfbmwADat1i .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-J6KUfXfbmwADat1i .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-J6KUfXfbmwADat1i .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-J6KUfXfbmwADat1i .cluster text{fill:#333;}#mermaid-svg-J6KUfXfbmwADat1i .cluster span{color:#333;}#mermaid-svg-J6KUfXfbmwADat1i div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-J6KUfXfbmwADat1i .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-J6KUfXfbmwADat1i rect.text{fill:none;stroke-width:0;}#mermaid-svg-J6KUfXfbmwADat1i .icon-shape,#mermaid-svg-J6KUfXfbmwADat1i .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-J6KUfXfbmwADat1i .icon-shape p,#mermaid-svg-J6KUfXfbmwADat1i .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-J6KUfXfbmwADat1i .icon-shape .label rect,#mermaid-svg-J6KUfXfbmwADat1i .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-J6KUfXfbmwADat1i .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-J6KUfXfbmwADat1i .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-J6KUfXfbmwADat1i :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 没超
超了
是
否
用户输入新消息
估算 Token 数
超过历史预算?
直接加入历史
触发压缩
保留系统消息+最近5条
旧消息压缩为摘要
触发检查点?
保存状态快照
拼接完整上下文
发送给 AI 模型
运行效果
跑一下代码看看实际效果:
Token 预算分配(总计=200):
system : 20 Token(10%)
history : 120 Token(60%)
current : 30 Token(15%)
response : 30 Token(15%)
[user] 你好,我想学Python
历史: 17/120 Token, 1条
[assistant] Python很适合初学者,语法简洁。
历史: 46/120 Token, 2条
...
[assistant] 先掌握Python基础,再学机器学习。
[压缩] 258 Token -> 目标 120
[压缩完成] 258 -> 268 Token
历史: 268/120 Token, 10条
可以看到:当历史 Token 超过预算时,自动触发压缩,旧消息被合并成摘要。
思考题
- 什么是上下文窗口? 用你自己的话解释。
- 三种策略对比 -- FIFO、LRU、重要性排序,各自适合什么场景?
- 检查点 vs 备份 -- 有什么区别?
- Token 是什么? 用最简单的话解释。
- 为什么系统提示只分配 10%? 如果反过来分配会怎样?
总结
今天学了 6 个核心概念,用 Java 的话说就是:
| 概念 | Java 类比 |
|---|---|
| 上下文窗口 | 数据库连接池 |
| 三种策略 | 队列 / LinkedHashMap / 优先级队列 |
| 检查点 | 事务 Savepoint |
| Token 预算 | 内存分配器 |
| 上下文压缩 | 数据压缩 GZIP |
| 策略模式 | 可切换的压缩策略 |
一句话总结:上下文管理 = 在有限资源下做出最优选择。
下一步
Day 17 可能的主题:
- 多模态上下文:处理图片、音频、视频
- 跨会话记忆:长期记忆系统
- 上下文共享:多个智能体共享上下文
🤖 系列:Java工程师转AI Agent 3个月学习计划
👤 作者:宸丶一 | 28岁Java程序员,正在学习 AI Agent 开发中ing...
💬 个人格言: 代码改不改变世界我不知道,但先让我准时下班。
如果觉得有帮助,点个赞再走呗!