LangGraph Agent 架构基础：从概念到第一个可运行的Agent

Agent 与传统应用的区别

想象你需要写一个应用来回答用户的复杂问题。传统做法是：用户提问 → 你的代码查询数据库/调用 API → 返回结果。这个过程是预定义的、线性的。

但如果问题很复杂呢？比如："帮我查询最新的天气，然后根据天气推荐今天的穿着，最后发送一条提醒消息"。这个过程不是简单的链条，而是：LLM 需要思考 → 决定调用哪个工具 → 获得结果 → 继续思考或调用更多工具 → 最终给出答案。

这就是 Agent 的核心。Agent 不是固定的执行路径，而是一个智能的决策者，它在运行时根据上下文和工具的反馈动态调整行为。

Agent 的动态特性：ReAct 循环

Agent 遵循 ReAct 模式（Reasoning + Acting），一个简单但强大的循环：

复制代码

1. Reasoning: LLM 阅读用户的消息和之前的工具结果，思考"我需要做什么？"
2. Acting: LLM 决定调用哪个工具（或直接回答），发出工具调用请求
3. Observing: 工具执行，返回结果
4. 重复：直到 LLM 决定问题已解决，给出最终答案

一个具体的例子：

复制代码

用户: "北京现在几点？"

✓ 思考: "我需要知道北京的时区，然后用当前 UTC 时间计算"
✓ 行动: 调用 get_timezone("北京") 和 get_current_utc_time()
✓ 观察: 得到 "UTC+8" 和 "2024-01-15 10:00 UTC"
✓ 思考: "UTC+8 意味着加 8 小时。10:00 + 8 = 18:00"
✓ 行动: 返回最终答案 "北京现在下午 6 点"

这个能力看似简单，但它解锁了一个全新的应用范围：自主工作流、信息检索、数据分析、API 自动化等。

为什么需要 LangGraph 而不是简单的脚本？

你可能会想："为什么我不能用简单的 Python 脚本实现这个？"

有以下问题：

状态管理：Agent 的每一步产生的消息和结果都需要被记录和管理。如果 Agent 运行了 20 步，你需要追踪所有 20 步的上下文
容错性：如果网络中断或 API 超时，你希望能从中断处恢复，而不是重新开始。这需要持久化和检查点机制
调试和监控：在生产环境中，你需要看到 Agent 到底做了什么、为什么做这些、耗费了多少 token。传统脚本无法提供这种深度可见性
异步并发：真实的 Agent 可能同时调用多个工具。传统脚本的并发编写很复杂，容易出错

LangGraph 正是为了解决这些问题而设计的。它提供：

✅ 自动的状态管理和持久化

✅ 开箱即用的容错和恢复

✅ 与 LangSmith 的深度集成，提供完整的可观测性

✅ 原生的异步支持

✅ 易于测试和部署的结构

LangGraph 核心概念

在深入代码之前，我们需要理解 LangGraph 的三个基石：State（状态）、Nodes（节点）、Edges（边）。

图的基本构成：State、Nodes、Edges

LangGraph 把你的 Agent 建模为一个有向图。这个图有三个要素：

1. State（状态）

State 是图在任何时刻的全局数据。它是一个字典或类，包含了 Agent 需要的所有信息。

简单例子：

python 复制代码

from typing_extensions import TypedDict

class AgentState(TypedDict):
    messages: list  # 对话历史
    documents: list  # 检索到的文档
    current_question: str  # 用户的当前问题

State 就像 Agent 的"脑子"，记录它所知道的一切。每一步执行后，State 都会被更新。

2. Nodes（节点）

Node 是图中的一个工作单元。它是一个函数，接收当前 State，处理它，然后返回对 State 的更新。

简单例子：

python 复制代码

def reasoning_node(state: AgentState) -> dict:
    """让 LLM 决定下一步该做什么"""
    # 基于 state["messages"] 进行推理
    decision = llm.invoke(state["messages"])
    return {"messages": state["messages"] + [decision]}

每个 Node 都是独立的、可重用的。你可以有 "LLM Node"、"Tool Execution Node"、"Data Processing Node" 等。

3. Edges（边）

Edge 定义了节点之间的流动。它连接两个节点，说明在什么条件下应该从一个节点移动到另一个节点。

简单例子：

python 复制代码

# 无条件边：总是从 node_A 到 node_B
graph.add_edge("node_A", "node_B")

# 条件边：根据函数的返回值决定下一个节点
def should_continue(state: AgentState):
    if state["messages"][-1].content == "DONE":
        return "end"
    return "continue"

graph.add_conditional_edges(
    "decision_node",
    should_continue,
    {"end": END, "continue": "action_node"}
)

图的可视化：

复制代码

START
  ↓
[LLM 推理节点] ← 接收 State，选择工具
  ↓
[条件判断] ← 是否需要调用工具？
  ├─ 是 → [工具执行节点] ↺ 循环回到推理
  └─ 否 → END（返回答案）

消息状态：MessagesState

在实践中，99% 的 Agent 都使用一个特殊的 State 类型：MessagesState。它已经包含了所有必要的消息追踪逻辑。

python 复制代码

from langgraph.graph import MessagesState

# 这就足够了！MessagesState 自动包含：
# - messages: 消息列表（HumanMessage, AIMessage, ToolMessage 等）
# - 自动的消息追踪和管理

为什么 MessagesState 这么有用？因为 OpenAI、Claude 等所有现代 LLM 都使用消息格式。MessagesState 让你直接把消息传给 LLM，无需额外转换。

消息流的例子：

python 复制代码

# State 中的 messages 列表不断增长：

# 初始状态
messages = [
    HumanMessage(content="北京现在几点？")
]

# 经过 LLM 节点
messages = [
    HumanMessage(content="北京现在几点？"),
    AIMessage(content="I need to check the time in Beijing", tool_calls=[...])
]

# 经过工具节点
messages = [
    HumanMessage(content="北京现在几点？"),
    AIMessage(content="I need to check the time in Beijing", tool_calls=[...]),
    ToolMessage(content="Beijing timezone: UTC+8, Current UTC time: 10:00 UTC", tool_call_id="...")
]

# 经过下一个 LLM 节点
messages = [
    HumanMessage(content="北京现在几点？"),
    AIMessage(content="I need to check the time in Beijing", tool_calls=[...]),
    ToolMessage(content="Beijing timezone: UTC+8, Current UTC time: 10:00 UTC", tool_call_id="..."),
    AIMessage(content="Based on UTC+8, it's 18:00 in Beijing")  # 最终答案
]

Checkpointing 和持久化的初步认识

想象这个场景：你的 Agent 已经执行了 10 步，突然 API 调用失败了。你希望能重新开始最后那一步，而不是从头再来。这就是 Checkpointing 的作用。

LangGraph 的 Checkpointer 在每一步执行后自动保存图的状态：

python 复制代码

from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()

graph = builder.compile(
    checkpointer=checkpointer
)

# 现在，graph 每执行一步都会自动保存状态
# 你可以在任何时刻暂停、恢复或重新开始

Checkpointing 的好处：

✅ 容错：失败后能从中断处恢复，而不是全部重做

✅ 人工干预 ：可以在 Agent 运行时暂停并让人工审核或修改状态

✅ 多轮对话 ：可以跨多个请求维护同一个 Agent 的状态（会话）

✅ 调试：可以查看每个检查点的状态，追踪问题

更多细节我们在第三篇博客会讲。现在，你只需要知道：Checkpointing 让你的 Agent 变得可靠、可恢复、可调试。

构建你的第一个 Agent

现在让我们动手实现一个真实的 Agent。这个 Agent 会回答关于天气的问题，并根据天气提供穿着建议。

选择 LLM（模型）

首先，你需要一个 LLM。我们使用 OpenAI 的 GPT-4（因为它的工具调用能力最强），但你也可以用 Claude、Gemini 等。

Python：

python 复制代码

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4o",
    temperature=0,  # 确定性回答，不要创意
    api_key="your-openai-api-key"
)

TypeScript：

typescript 复制代码

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-4o",
  temperature: 0,
  apiKey: "your-openai-api-key",
});

定义 Tools（工具）

Tools 是 Agent 能使用的外部功能。LLM 会读取 tool 的描述，决定是否调用它以及传什么参数。

Python：

python 复制代码

from langchain.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the current weather in a location.
    
    Args:
        location: The city name (e.g., 'Beijing', 'New York')
    
    Returns:
        Weather description
    """
    # 这是一个 mock，实际应该调用天气 API
    weather_data = {
        "Beijing": "Sunny, 15°C",
        "New York": "Rainy, 10°C",
        "Tokyo": "Cloudy, 12°C",
    }
    return weather_data.get(location, "Weather data not available")

@tool
def get_clothing_recommendation(weather: str) -> str:
    """Get clothing recommendations based on weather.
    
    Args:
        weather: Weather description (e.g., 'Sunny, 15°C')
    
    Returns:
        Clothing recommendation
    """
    if "Sunny" in weather:
        return "Wear light clothes, sunscreen, and sunglasses"
    elif "Rainy" in weather:
        return "Bring an umbrella and wear waterproof jacket"
    elif "Cloudy" in weather:
        return "Light jacket recommended"
    return "Check the weather details"

tools = [get_weather, get_clothing_recommendation]

TypeScript：

typescript 复制代码

import * as z from "zod";
import { tool } from "langchain";

const getWeather = tool(
  ({ location }) => {
    const weatherData: Record<string, string> = {
      Beijing: "Sunny, 15°C",
      "New York": "Rainy, 10°C",
      Tokyo: "Cloudy, 12°C",
    };
    return weatherData[location] || "Weather data not available";
  },
  {
    name: "get_weather",
    description: "Get the current weather in a location",
    schema: z.object({
      location: z.string().describe("The city name (e.g., 'Beijing', 'New York')"),
    }),
  }
);

const getClothingRecommendation = tool(
  ({ weather }) => {
    if (weather.includes("Sunny")) {
      return "Wear light clothes, sunscreen, and sunglasses";
    } else if (weather.includes("Rainy")) {
      return "Bring an umbrella and wear waterproof jacket";
    } else if (weather.includes("Cloudy")) {
      return "Light jacket recommended";
    }
    return "Check the weather details";
  },
  {
    name: "get_clothing_recommendation",
    description: "Get clothing recommendations based on weather",
    schema: z.object({
      weather: z.string().describe("Weather description"),
    }),
  }
);

const tools = [getWeather, getClothingRecommendation];

注意： Tool 的关键是它的描述。LLM 会读这个描述，决定什么时候调用这个 tool。描述应该清晰、简洁、具体。

编写执行逻辑：Tool Calling Loop

现在我们把所有东西组合在一起。这里是 ReAct 循环的完整实现。

Python：

python 复制代码

from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import HumanMessage
import json

# 创建图
builder = StateGraph(MessagesState)

# 创建工具执行节点（LangGraph 提供了现成的）
tool_node = ToolNode(tools)

# 定义 LLM 节点（处理 ReAct 中的 Reasoning 和 Acting）
def llm_node(state: MessagesState):
    """让 LLM 思考并决定做什么"""
    # 把工具绑定到 LLM，这样它就知道有哪些工具可用
    model_with_tools = model.bind_tools(tools)
    
    # LLM 读取消息历史，返回新的消息
    response = model_with_tools.invoke(state["messages"])
    return {"messages": [response]}

# 条件判断：是否需要继续调用工具？
def should_continue(state: MessagesState):
    """
    判断最后的消息是否包含工具调用。
    如果包含，就进入工具执行节点。
    否则，就进入结束。
    """
    messages = state["messages"]
    last_message = messages[-1]
    
    # 检查最后的消息是否有工具调用
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"  # 去执行工具
    return END  # 任务完成

# 添加节点到图
builder.add_node("llm", llm_node)
builder.add_node("tools", tool_node)

# 添加边（流程连接）
builder.add_edge(START, "llm")  # 从开始进入 LLM 节点
builder.add_conditional_edges(
    "llm",
    should_continue,
    {"tools": "tools", END: END}  # 根据条件判断进入工具或结束
)
builder.add_edge("tools", "llm")  # 工具执行后回到 LLM（继续推理）

# 编译图，添加 checkpointer 用于持久化
from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()
graph = builder.compile(checkpointer=checkpointer)

TypeScript：

typescript 复制代码

import { StateGraph, MessagesState, START, END } from "@langchain/langgraph";
import { HumanMessage } from "@langchain/core/messages";
import { ToolNode } from "@langchain/langgraph/prebuilt";

// 创建图
const builder = new StateGraph(MessagesState)
  .addNode("llm", async (state) => {
    // 把工具绑定到 LLM
    const modelWithTools = model.bindTools(tools);
    const response = await modelWithTools.invoke(state.messages);
    return { messages: [response] };
  })
  .addNode("tools", new ToolNode(tools))
  .addEdge(START, "llm")
  .addConditionalEdges("llm", (state) => {
    // 判断是否需要继续调用工具
    const lastMessage = state.messages[state.messages.length - 1];
    if ("tool_calls" in lastMessage && lastMessage.tool_calls.length > 0) {
      return "tools";
    }
    return END;
  })
  .addEdge("tools", "llm");

// 编译图，添加 checkpointer
import { MemorySaver } from "@langchain/langgraph";

const checkpointer = new MemorySaver();
const graph = builder.compile({ checkpointer });

这段代码做了什么：

llm_node：让 LLM 读取消息历史，思考，决定下一步（调用工具或直接回答）
should_continue：检查 LLM 的回复中是否有工具调用
边的连接：形成一个循环 → LLM 思考 → 调用工具 → 回到 LLM → 重复直到 LLM 说"完成"

完整代码示例

以下是完整的、可以直接运行的示例：

Python（完整版本）：

python 复制代码

from langchain_openai import ChatOpenAI
from langchain.tools import tool
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver
from langchain_core.messages import HumanMessage
import os

# 初始化 LLM
model = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    api_key=os.getenv("OPENAI_API_KEY")
)

# 定义工具
@tool
def get_weather(location: str) -> str:
    """Get the current weather in a location."""
    weather_data = {
        "Beijing": "Sunny, 15°C",
        "New York": "Rainy, 10°C",
        "Tokyo": "Cloudy, 12°C",
    }
    return weather_data.get(location, "Weather data not available")

@tool
def get_clothing_recommendation(weather: str) -> str:
    """Get clothing recommendations based on weather."""
    if "Sunny" in weather:
        return "Wear light clothes, sunscreen, and sunglasses"
    elif "Rainy" in weather:
        return "Bring an umbrella and wear waterproof jacket"
    elif "Cloudy" in weather:
        return "Light jacket recommended"
    return "Check the weather details"

tools = [get_weather, get_clothing_recommendation]

# 构建 Agent 图
builder = StateGraph(MessagesState)

def llm_node(state: MessagesState):
    """LLM 节点：思考并决定做什么"""
    model_with_tools = model.bind_tools(tools)
    response = model_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def should_continue(state: MessagesState):
    """判断是否继续调用工具"""
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return END

# 添加节点和边
tool_node = ToolNode(tools)
builder.add_node("llm", llm_node)
builder.add_node("tools", tool_node)

builder.add_edge(START, "llm")
builder.add_conditional_edges(
    "llm",
    should_continue,
    {"tools": "tools", END: END}
)
builder.add_edge("tools", "llm")

# 编译
checkpointer = MemorySaver()
graph = builder.compile(checkpointer=checkpointer)

# 运行 Agent！
if __name__ == "__main__":
    config = {"configurable": {"thread_id": "user_123"}}
    
    # 调用 Agent
    result = graph.invoke(
        {"messages": [HumanMessage(content="What's the weather in Beijing and what should I wear?")]},
        config=config
    )
    
    # 打印最终答案（最后一条消息）
    print("\n=== Final Answer ===")
    print(result["messages"][-1].content)

TypeScript（完整版本）：

typescript 复制代码

import { ChatOpenAI } from "@langchain/openai";
import * as z from "zod";
import { tool } from "langchain";
import { StateGraph, MessagesState, START, END, MemorySaver } from "@langchain/langgraph";
import { ToolNode } from "@langchain/langgraph/prebuilt";
import { HumanMessage } from "@langchain/core/messages";

// 初始化 LLM
const model = new ChatOpenAI({
  model: "gpt-4o",
  temperature: 0,
  apiKey: process.env.OPENAI_API_KEY,
});

// 定义工具
const getWeather = tool(
  ({ location }) => {
    const weatherData: Record<string, string> = {
      Beijing: "Sunny, 15°C",
      "New York": "Rainy, 10°C",
      Tokyo: "Cloudy, 12°C",
    };
    return weatherData[location] || "Weather data not available";
  },
  {
    name: "get_weather",
    description: "Get the current weather in a location",
    schema: z.object({
      location: z.string().describe("The city name"),
    }),
  }
);

const getClothingRecommendation = tool(
  ({ weather }) => {
    if (weather.includes("Sunny")) {
      return "Wear light clothes, sunscreen, and sunglasses";
    } else if (weather.includes("Rainy")) {
      return "Bring an umbrella and wear waterproof jacket";
    } else if (weather.includes("Cloudy")) {
      return "Light jacket recommended";
    }
    return "Check the weather details";
  },
  {
    name: "get_clothing_recommendation",
    description: "Get clothing recommendations based on weather",
    schema: z.object({
      weather: z.string().describe("Weather description"),
    }),
  }
);

const tools = [getWeather, getClothingRecommendation];

// 构建 Agent 图
const builder = new StateGraph(MessagesState)
  .addNode("llm", async (state) => {
    const modelWithTools = model.bindTools(tools);
    const response = await modelWithTools.invoke(state.messages);
    return { messages: [response] };
  })
  .addNode("tools", new ToolNode(tools))
  .addEdge(START, "llm")
  .addConditionalEdges("llm", (state) => {
    const lastMessage = state.messages[state.messages.length - 1];
    if ("tool_calls" in lastMessage && lastMessage.tool_calls.length > 0) {
      return "tools";
    }
    return END;
  })
  .addEdge("tools", "llm");

// 编译
const checkpointer = new MemorySaver();
const graph = builder.compile({ checkpointer });

// 运行 Agent
(async () => {
  const config = { configurable: { thread_id: "user_123" } };
  
  const result = await graph.invoke(
    {
      messages: [
        new HumanMessage("What's the weather in Beijing and what should I wear?"),
      ],
    },
    config
  );
  
  console.log("\n=== Final Answer ===");
  console.log(result.messages[result.messages.length - 1].content);
})();

运行与测试

本地调试技巧

运行上面的代码，你应该会看到类似这样的输出：

复制代码

=== Final Answer ===
Based on the current weather in Beijing (Sunny, 15°C), you should:
1. Wear light clothes since it's sunny
2. Apply sunscreen and wear sunglasses to protect from the sun
3. Consider a light jacket as 15°C might feel a bit cool in the morning or evening

Enjoy your day in Beijing!

让我们添加一些调试代码，看到 Agent 的完整执行过程：

python 复制代码

# Python
from langchain_core.messages import AIMessage, ToolMessage

# 运行并逐步打印
result = graph.invoke(
    {"messages": [HumanMessage(content="What's the weather in Beijing and what should I wear?")]},
    config=config
)

# 打印所有消息，看到完整的 ReAct 循环
print("\n=== Full Agent Conversation ===")
for i, msg in enumerate(result["messages"]):
    msg_type = msg.__class__.__name__
    
    if msg_type == "HumanMessage":
        print(f"\n[{i}] 👤 Human: {msg.content}")
    elif msg_type == "AIMessage":
        print(f"\n[{i}] 🤖 AI: {msg.content}")
        if hasattr(msg, "tool_calls") and msg.tool_calls:
            print("    Tool Calls:")
            for tool_call in msg.tool_calls:
                print(f"      - {tool_call['name']}: {tool_call['args']}")
    elif msg_type == "ToolMessage":
        print(f"\n[{i}] 🔧 Tool Result: {msg.content}")

输出会类似：

复制代码

=== Full Agent Conversation ===

[0] 👤 Human: What's the weather in Beijing and what should I wear?

[1] 🤖 AI: I'll help you find out the weather in Beijing and get a clothing recommendation.
    Tool Calls:
      - get_weather: {'location': 'Beijing'}

[2] 🔧 Tool Result: Sunny, 15°C

[3] 🤖 AI: Now let me get the clothing recommendation based on the weather.
    Tool Calls:
      - get_clothing_recommendation: {'weather': 'Sunny, 15°C'}

[4] 🔧 Tool Result: Wear light clothes, sunscreen, and sunglasses

[5] 🤖 AI: Based on the current weather in Beijing (Sunny, 15°C), you should...

这就是完整的 ReAct 循环！🎯

查看执行过程

除了打印消息，LangSmith 提供了更强大的可视化。后续博客会详细讲。现在，你只需要知道可以启用 LangSmith 追踪：

python 复制代码

import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-langsmith-api-key"

# 然后运行代码，所有的执行过程都会被自动发送到 LangSmith 云端
# 你可以在 https://smith.langchain.com 看到完整的执行树

常见问题排查

问题 1：Tool 没有被调用

症状： Agent 的回复中没有 tool_calls，直接给出了答案，没有调用工具。

原因： LLM 可能没有正确理解工具的用途。

解决方案：

检查 tool 的 description 是否清晰（避免模糊的描述）
确保 model 支持工具调用（GPT-4、Claude 3+、Gemini Pro 都支持）
在系统消息中明确说"你有这些工具可用"

问题 2：Tool 执行失败，错误堆栈

症状： Tool 抛出异常。

解决方案：

添加 try-catch，在 tool 中返回有用的错误消息
在 LLM 节点中添加错误处理（返回错误消息给 LLM，让它重试）

改进的工具定义：

python 复制代码

@tool
def get_weather(location: str) -> str:
    """Get the current weather in a location."""
    try:
        weather_data = {
            "Beijing": "Sunny, 15°C",
            "New York": "Rainy, 10°C",
        }
        result = weather_data.get(location)
        if result is None:
            return f"Error: Weather data for {location} not found. Available locations: {list(weather_data.keys())}"
        return result
    except Exception as e:
        return f"Error calling weather API: {str(e)}"

问题 3：Agent 陷入死循环

症状： Agent 一直在调用工具，不停止。

原因： should_continue 逻辑有问题，或 LLM 没有学会什么时候停止。

解决方案：

添加最大步数限制
改进系统提示，明确告诉 LLM 什么时候应该停止
检查 tool 的返回值，是否足以让 LLM 做出决定

添加步数限制：

python 复制代码

MAX_ITERATIONS = 10

def should_continue(state: MessagesState):
    last_message = state["messages"][-1]
    
    # 步数限制
    if len(state["messages"]) > MAX_ITERATIONS * 2:  # 每步 2 条消息
        return END
    
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return END

总结：下一步学什么

恭喜！你现在已经理解了 Agent 的核心原理，并建立了第一个工作的 Agent。

你学到了：

✅ Agent 的动态特性（ReAct 循环）

✅ 为什么需要 LangGraph

✅ State、Nodes、Edges 的概念

✅ MessagesState 和 Checkpointing

✅ 如何定义工具和编写 Agent

✅ 基本的调试技巧

但这仅仅是开始。你现在可以：

实验：修改工具的描述、添加新工具、改变 system prompt，看看 Agent 的行为如何变化
生产化：为工具添加错误处理、日志、性能优化
接下来学习第二篇：自定义 State、Middleware、多 Agent 系统

第二篇预告： 《Agent 进阶设计：状态管理、中间件与多Agent协作》

在第二篇中，你将学到：

如何定义复杂的 State 结构以适应你的业务逻辑
中间件（Middleware）如何让你拦截 LLM 调用、进行动态模型选择、错误处理等
如何构建多个 Agent 互相协作的系统
内存管理和上下文优化

敬请期待！

相关文档：