LangGraph Node-Level Caching

【LangGraph 进阶】详解 Node-Level Caching（节点级缓存）：大幅降低大模型应用延迟与 Token 成本

前言

在大模型（LLM）与多智能体（AI Agents）的复杂工作流中，**延迟（Latency）和Token 消耗（Cost）**往往是阻碍应用走向生产环境的两大痛点。例如，在调试 Agent 或者是执行包含固定数据拉取、长文本重度分析的步骤时，如果每次运行都让节点从头计算，会造成极大的资源浪费。

为了解决这个问题，LangGraph 推出了官方的**节点级缓存（Node-level Caching）**功能。通过为图中的节点（Node）配置缓存策略，可以复用之前计算好的节点状态输入与输出，从而在未发生数据变动时瞬间返回结果，实现 0 Token 消耗和毫秒级响应。

本文将带你深度解析 LangGraph 节点缓存的核心概念、工作原理，并结合完整代码演示如何在实际项目中落地。

一、什么是 LangGraph 节点级缓存？

在 LangGraph 中，图的执行由多个节点（Node）组成，节点之间通过状态（State）传递消息。

Node-Level Caching 的核心逻辑是：当某个节点接收到与之前完全相同的输入状态时，LangGraph 不会重新执行该节点内的函数（例如调用大模型或慢速 API），而是直接从缓存存储中拉取上一次的输出结果并更新到图的状态中。

核心概念对比

为了防止概念混淆，我们需要理清 LangGraph 中几种不同的存储机制：

Checkpointer (检查点)：用于保存图的执行历史，实现断点续传、人类介入（HITL）以及多轮对话记忆（Thread State）。
Store (跨线程存储)：用于跨用户、跨会话共享长期记忆（如用户偏好）。
Node Caching (节点缓存) ：用于针对确定性高或耗时长的节点输入输出进行临时或长期缓存，目的是降本提速。

二、节点缓存流转原理图（Mermaid）

在进入代码之前，我们通过下面的拓扑图来看看 LangGraph 在执行一个配置了缓存策略的节点时，内部是如何流转的：
#mermaid-svg-62cHMpy9lRFMdDqL{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-62cHMpy9lRFMdDqL .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-62cHMpy9lRFMdDqL .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-62cHMpy9lRFMdDqL .error-icon{fill:#552222;}#mermaid-svg-62cHMpy9lRFMdDqL .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-62cHMpy9lRFMdDqL .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-62cHMpy9lRFMdDqL .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-62cHMpy9lRFMdDqL .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-62cHMpy9lRFMdDqL .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-62cHMpy9lRFMdDqL .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-62cHMpy9lRFMdDqL .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-62cHMpy9lRFMdDqL .marker{fill:#333333;stroke:#333333;}#mermaid-svg-62cHMpy9lRFMdDqL .marker.cross{stroke:#333333;}#mermaid-svg-62cHMpy9lRFMdDqL svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-62cHMpy9lRFMdDqL p{margin:0;}#mermaid-svg-62cHMpy9lRFMdDqL .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-62cHMpy9lRFMdDqL .cluster-label text{fill:#333;}#mermaid-svg-62cHMpy9lRFMdDqL .cluster-label span{color:#333;}#mermaid-svg-62cHMpy9lRFMdDqL .cluster-label span p{background-color:transparent;}#mermaid-svg-62cHMpy9lRFMdDqL .label text,#mermaid-svg-62cHMpy9lRFMdDqL span{fill:#333;color:#333;}#mermaid-svg-62cHMpy9lRFMdDqL .node rect,#mermaid-svg-62cHMpy9lRFMdDqL .node circle,#mermaid-svg-62cHMpy9lRFMdDqL .node ellipse,#mermaid-svg-62cHMpy9lRFMdDqL .node polygon,#mermaid-svg-62cHMpy9lRFMdDqL .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-62cHMpy9lRFMdDqL .rough-node .label text,#mermaid-svg-62cHMpy9lRFMdDqL .node .label text,#mermaid-svg-62cHMpy9lRFMdDqL .image-shape .label,#mermaid-svg-62cHMpy9lRFMdDqL .icon-shape .label{text-anchor:middle;}#mermaid-svg-62cHMpy9lRFMdDqL .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-62cHMpy9lRFMdDqL .rough-node .label,#mermaid-svg-62cHMpy9lRFMdDqL .node .label,#mermaid-svg-62cHMpy9lRFMdDqL .image-shape .label,#mermaid-svg-62cHMpy9lRFMdDqL .icon-shape .label{text-align:center;}#mermaid-svg-62cHMpy9lRFMdDqL .node.clickable{cursor:pointer;}#mermaid-svg-62cHMpy9lRFMdDqL .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-62cHMpy9lRFMdDqL .arrowheadPath{fill:#333333;}#mermaid-svg-62cHMpy9lRFMdDqL .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-62cHMpy9lRFMdDqL .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-62cHMpy9lRFMdDqL .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-62cHMpy9lRFMdDqL .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-62cHMpy9lRFMdDqL .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-62cHMpy9lRFMdDqL .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-62cHMpy9lRFMdDqL .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-62cHMpy9lRFMdDqL .cluster text{fill:#333;}#mermaid-svg-62cHMpy9lRFMdDqL .cluster span{color:#333;}#mermaid-svg-62cHMpy9lRFMdDqL div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-62cHMpy9lRFMdDqL .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-62cHMpy9lRFMdDqL rect.text{fill:none;stroke-width:0;}#mermaid-svg-62cHMpy9lRFMdDqL .icon-shape,#mermaid-svg-62cHMpy9lRFMdDqL .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-62cHMpy9lRFMdDqL .icon-shape p,#mermaid-svg-62cHMpy9lRFMdDqL .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-62cHMpy9lRFMdDqL .icon-shape .label rect,#mermaid-svg-62cHMpy9lRFMdDqL .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-62cHMpy9lRFMdDqL .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-62cHMpy9lRFMdDqL .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-62cHMpy9lRFMdDqL :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Hit 缓存命中 & 未过期
Miss 未命中 / 已过期
图开始执行
准备进入目标 Node
计算输入哈希值

检查缓存后端是否存在?
直接提取缓存的 Output
更新全局 State
真正执行节点函数

调用 LLM 或 API 耗时3秒
将结果写入 Cache Backend

分配 TTL 有效期
节点结束/流转下一步

三、节点缓存的两大核心组件

要让 LangGraph 的缓存机制运转起来，必须两部分协同工作：

缓存后端（Cache Backend） ：定义缓存存放在哪里（如内存、SQLite、Redis 等）。在 graph.compile(cache=...) 时注入。
缓存策略（Cache Policy） ：定义哪些节点需要缓存、缓存多久。在 builder.add_node(..., cache_policy=...) 时配置。

1. 缓存策略参数（CachePolicy）

CachePolicy 主要支持两个参数配置：

key_func：用于根据节点的输入生成缓存键（Cache Key）。默认使用 pickle 对输入进行哈希（Hash）。
ttl（Time to Live）：缓存的有效期（单位：秒）。如果设置为 None（默认值），则缓存永久有效，除非手动清理。

四、实战演练：如何在 LangGraph 中启用节点缓存

下面我们通过一个完整的 Python 代码示例，展示如何配置 InMemoryCache 并为指定节点添加缓存策略。

1. 环境准备

确保你的环境已安装最新版的 langgraph 和 langchain-core：

bash 复制代码

pip install -U langgraph langchain-core

2. 完整实现代码

python 复制代码

import time
from typing import TypedDict
from typing_extensions import Annotated
from langchain_core.runnables import RunnableConfig
from langgraph.graph import START, END, StateGraph
# 导入缓存相关的组件
from langgraph.cache.memory import InMemoryCache
from langgraph.types import CachePolicy

# 1. 定义图的状态
class State(TypedDict):
    query: str
    result: str

# 2. 定义一个模拟"高延迟/高成本"的节点
def heavy_computation_node(state: State, config: RunnableConfig) -> dict:
    print(f"--- [Node] 正在执行高耗时计算，输入查询: {state['query']} ---")
    # 模拟耗时操作（例如复杂的LLM推理或外网API请求）
    time.sleep(3) 
    return {"result": f"处理完成: {state['query'].upper()}"}

# 3. 构建状态图
builder = StateGraph(State)

# 【核心步骤一】：在 add_node 时配置缓存策略 cache_policy
# 这里设置缓存过期时间为 10 秒
my_cache_policy = CachePolicy(ttl=10)

builder.add_node(
    "heavy_node", 
    heavy_computation_node, 
    cache_policy=my_cache_policy
)

builder.add_edge(START, "heavy_node")
builder.add_edge("heavy_node", END)

# 【核心步骤二】：在编译图时，注入缓存后端存储（这里使用内存缓存）
memory_cache = InMemoryCache()
graph = builder.compile(cache=memory_cache)

# 4. 测试缓存效果
if __name__ == "__main__":
    input_data = {"query": "hello langgraph"}
    
    # 第一次触发：缓存未命中，节点正常执行（耗时约3秒）
    print("\n>>> 第一次运行（期望：触发节点计算）...")
    start_time = time.time()
    res1 = graph.invoke(input_data)
    print(f"运行结果: {res1}")
    print(f"第一次耗时: {time.time() - start_time:.2f} 秒\n")
    
    # 第二次触发：相同输入，且在TTL内，缓存命中（耗时接近0秒）
    print(">>> 第二次运行（相同输入，期望：命中缓存）...")
    start_time = time.time()
    res2 = graph.invoke(input_data)
    print(f"运行结果: {res2}")
    print(f"第二次耗时: {time.time() - start_time:.2f} 秒\n")
    
    # 第三次触发：改变输入，缓存未命中（耗时约3秒）
    print(">>> 第三次运行（不同输入，期望：触发节点计算）...")
    start_time = time.time()
    res3 = graph.invoke({"query": "hello world"})
    print(f"运行结果: {res3}")
    print(f"第三次耗时: {time.time() - start_time:.2f} 秒\n")
    
    # 第四次触发：等待 11 秒让 TTL 过期，再次运行
    print(">>> 等待 11 秒让缓存过期...")
    time.sleep(11)
    print(">>> 第四次运行（相同输入，但已过期，期望：重新计算）...")
    start_time = time.time()
    res4 = graph.invoke(input_data)
    print(f"运行结果: {res4}")
    print(f"第四次耗时: {time.time() - start_time:.2f} 秒\n")

3. 运行控制台输出分析

当你运行上述代码时，控制台将输出以下内容：

text 复制代码

>>> 第一次运行（期望：触发节点计算）...
--- [Node] 正在执行高耗时计算，输入查询: hello langgraph ---
运行结果: {'query': 'hello langgraph', 'result': '处理完成: HELLO LANGGRAPH'}
第一次耗时: 3.01 秒

>>> 第二次运行（相同输入，期望：命中缓存）...
运行结果: {'query': 'hello langgraph', 'result': '处理完成: HELLO LANGGRAPH'}
第二次耗时: 0.00 秒

>>> 第三次运行（不同输入，期望：触发节点计算）...
--- [Node] 正在执行高耗时计算，输入查询: hello world ---
运行结果: {'query': 'hello world', 'result': '处理完成: HELLO WORLD'}
第三次耗时: 3.01 秒

>>> 等待 11 秒让缓存过期...
>>> 第四次运行（相同输入，但已过期，期望：重新计算）...
--- [Node] 正在执行高耗时计算，输入查询: hello langgraph ---
运行结果: {'query': 'hello langgraph', 'result': '处理完成: HELLO LANGGRAPH'}
第四次耗时: 3.02 秒

结论：第二次运行时，控制台没有打印 --- [Node] 正在执行高耗时计算... ---，且耗时直接缩短到 0.00 秒，证明节点被成功跳过，直接使用了缓存数据。

五、生产环境下的最佳实践

1. 精确控制哪些节点需要缓存：

适合缓存：RAG 应用中的知识库检索（Retrieval）节点、结构化格式解析节点、不常变动的工具调用（Tool Node）节点。
不适合缓存：包含随机性（Temperature 高）的灵感创作节点、依赖动态时间或实时用户输入的节点。

2. 选择合适的缓存后端：

实验与本地测试开发：使用 InMemoryCache。
分布式和生产部署：建议接入 SqliteCache 或 RedisCache，确保多实例部署时缓存依然能够共享并持久化。

3. 在 Prebuilt Agents（预构建智能体）中的应用 ：

如果使用 create_react_agent 等预构建方法，由于节点是在内部自动生成的，无法直接通过 add_node 挂载 cache_policy。此时，你通常可以在图创建后，通过访问其内部节点字典或直接切换为手动构建图（Manual Graph Definition）来获得更细粒度的控制。

总结

LangGraph 的 Node-level Caching 为 Agent 的工程化落地补齐了重要的一环。通过简单的 CachePolicy 和 cache 后端配置，开发者可以轻松地在框架层面拦截重复计算。这不仅极大地改善了最终用户的等待体验，还帮企业守住了 Token 钱包。

如果你觉得这篇文章对你有帮助，欢迎点赞、收藏、关注！有任何关于 LangGraph 的疑问，欢迎在评论区交流讨论。

复制代码