从零开始搭建一个 AI Agent —— LangChain + TypeScript 实战手记

从零开始搭建一个 AI Agent ------ LangChain + TypeScript 实战手记

这不是一个"Hello World"级别的调用 LLM API 教程。我们要做的，是一个真正能自主调用工具 、流式输出 、记住上下文 、甚至能展示思考过程的 AI Agent。整个过程从零搭建，由浅入深，每一步都写清楚为什么这么做。

项目初始化：先把骨架搭起来
[理解 Agent：它和普通 LLM 调用有什么不同？](#理解 Agent：它和普通 LLM 调用有什么不同？ "#2-%E7%90%86%E8%A7%A3-agent%E5%AE%83%E5%92%8C%E6%99%AE%E9%80%9A-llm-%E8%B0%83%E7%94%A8%E6%9C%89%E4%BB%80%E4%B9%88%E4%B8%8D%E5%90%8C")
[定义工具：让 AI 拥有"手"](#定义工具：让 AI 拥有"手" "#3-%E5%AE%9A%E4%B9%89%E5%B7%A5%E5%85%B7%E8%AE%A9-ai-%E6%8B%A5%E6%9C%89%E6%89%8B")
[创建 Agent：把大脑和手连起来](#创建 Agent：把大脑和手连起来 "#4-%E5%88%9B%E5%BB%BA-agent%E6%8A%8A%E5%A4%A7%E8%84%91%E5%92%8C%E6%89%8B%E8%BF%9E%E8%B5%B7%E6%9D%A5")
流式输出：别让用户干等着
[多轮对话：让 Agent 记住上下文](#多轮对话：让 Agent 记住上下文 "#6-%E5%A4%9A%E8%BD%AE%E5%AF%B9%E8%AF%9D%E8%AE%A9-agent-%E8%AE%B0%E4%BD%8F%E4%B8%8A%E4%B8%8B%E6%96%87")
[Extended Thinking：看见模型的思考过程](#Extended Thinking：看见模型的思考过程 "#7-extended-thinking%E7%9C%8B%E8%A7%81%E6%A8%A1%E5%9E%8B%E7%9A%84%E6%80%9D%E8%80%83%E8%BF%87%E7%A8%8B")
回顾与展望

1. 项目初始化：先把骨架搭起来

1.1 创建项目

bash 复制代码

mkdir lingshi && cd lingshi
pnpm init

这一步没什么特别的，得到一个 package.json，后面往里加依赖就行。

1.2 安装依赖

bash 复制代码

# 运行时依赖
pnpm add langchain @langchain/anthropic @langchain/core @langchain/langgraph deepagents zod dotenv

# 开发依赖（TypeScript 相关）
pnpm add -D typescript tsx @types/node

简单解释一下每个包的职责：

包名	作用
`langchain`	LangChain 核心框架
`@langchain/anthropic`	Anthropic 兼容接口的 ChatModel
`@langchain/core`	核心工具定义（`tool` 函数从这里来）
`@langchain/langgraph`	Agent 的图结构引擎 + MemorySaver
`deepagents`	封装了 `createDeepAgent`，简化 Agent 创建
`zod`	运行时类型校验，用来定义工具的参数 schema
`dotenv`	加载 `.env` 文件中的环境变量
`tsx`	直接运行 TypeScript，不需要先编译

1.3 配置 TypeScript

创建 tsconfig.json：

json 复制代码

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true,
    "noEmit": true,
    "types": ["node"]
  },
  "include": ["src/**/*"]
}

几个关键配置：

module: "ESNext" --- 使用 ES 模块（import/export），对应 package.json 中的 "type": "module"
moduleResolution: "bundler" --- 适配现代工具的模块解析策略
noEmit: true --- 我们只用 tsx 直接运行，不需要 tsc 输出编译产物

1.4 环境变量

创建 .env 文件（敏感信息不要提交到 Git）：

env 复制代码

ANTHROPIC_API_KEY=sk-your-key-here
ANTHROPIC_BASE_URL=https://your-proxy-server.com/anthropic
MODEL_NAME=qwen3.7-plus

这里用了一个小技巧：通过 ANTHROPIC_BASE_URL 指向一个代理服务器，底层实际跑的是 qwen3.7-plus 模型，但因为接口兼容 Anthropic 协议，所以可以直接用 ChatAnthropic 来调用。

1.5 项目结构

最终的文件结构很简洁：

bash 复制代码

lingshi/
├── src/
│   ├── tools.ts      # 工具定义（计算器、获取时间）
│   ├── agents.ts     # Agent 创建与配置
│   └── index.ts      # 入口文件，运行测试
├── .env              # 环境变量（API Key 等）
├── package.json
└── tsconfig.json

三个文件各司其职，下面逐一展开。

2. 理解 Agent：它和普通 LLM 调用有什么不同？

在写代码之前，先搞清楚一个核心问题：Agent 到底是什么？

普通 LLM 调用

复制代码

用户输入 → LLM → 一次性返回结果，结束

LLM 就是一个"文本接龙机器"------你给它一段 prompt，它吐出一段回复，完事。它不能帮你查天气，不能帮你算数学，不能访问任何外部系统。

Agent 调用

yaml 复制代码

用户输入 → LLM → 需要工具吗？
                    ├─ Yes → 执行工具 → 结果喂回 LLM → 继续判断...
                    └─ No  → 返回最终回复

Agent 在 LLM 的基础上加了一个循环：

kotlin 复制代码

while (LLM 觉得还需要工具) {
  执行工具 → 结果喂回 LLM
}
return LLM 的最终回答

举个例子：用户问"帮我算 128 × 47"

LLM 看到有计算器工具，决定调用 → calculator({ a:128, b:47, operation:'multiply' })
工具返回 "128 multiply 47 = 6016"
LLM 拿到结果，生成自然语言回复："128 × 47 = 6016"

Agent = LLM + 工具 + 循环，这就是全部。

3. 定义工具：让 AI 拥有"手"

工具的完整代码在 src/tools.ts 中。

3.1 Tool Calling 原理

Tool Calling 是 Agent 的核心机制，它的工作流程分 5 步：

用户发送消息 → LLM 分析是否需要调用工具
LLM 返回 tool_call → 包含工具名 + 参数 JSON
Agent 框架执行工具函数 → 拿到结果
工具结果喂回 LLM → 作为新消息
LLM 综合结果 → 生成最终回复

第 2 步和第 3 步之间可能存在多次循环，这就是所谓的 Agent Loop。

3.2 计算器工具

typescript 复制代码

import { tool } from '@langchain/core/tools';
import { z } from 'zod';

export const calculatorTool = tool(
  // 第一个参数：工具的执行函数
  async ({ a, b, operation }) => {
    let result: number;
    switch (operation) {
      case 'add':      result = a + b; break;
      case 'subtract': result = a - b; break;
      case 'multiply': result = a * b; break;
      case 'divide':
        if (b === 0) return '错误：除数不能为零';
        result = a / b;
        break;
      default:
        return `错误：不支持的操作 "${operation}"`;
    }
    return `${a} ${operation} ${b} = ${result}`;
  },
  // 第二个参数：工具的元信息
  {
    name: 'calculator',
    description: '对两个数字执行四则运算（加、减、乘、除）',
    schema: z.object({
      a: z.number().describe('第一个数字'),
      b: z.number().describe('第二个数字'),
      operation: z
        .enum(['add', 'subtract', 'multiply', 'divide'])
        .describe('要执行的操作：add（加）、subtract（减）、multiply（乘）、divide（除）'),
    }),
  }
);

每个工具有三要素：

要素	说明
`name`	工具的唯一标识，LLM 通过这个名字来调用
`description`	告诉 LLM 这个工具能干什么，LLM 据此决定是否使用
`schema`	Zod 定义的参数类型，框架会转成 JSON Schema 发给 LLM

3.3 为什么用 Zod？

Zod 是一个 TypeScript-first 的运行时类型校验库。在 Deep Agents 中，Zod schema 承担了一个关键职责：告诉 LLM 该怎么传参数。

typescript 复制代码

z.object({
  a: z.number().describe('第一个数字'),
  b: z.number().describe('第二个数字'),
  operation: z.enum(['add', 'subtract', 'multiply', 'divide'])
    .describe('要执行的操作'),
})

框架内部会把这段 Zod schema 转换成 JSON Schema，大致长这样：

json 复制代码

{
  "type": "object",
  "properties": {
    "a": { "type": "number", "description": "第一个数字" },
    "b": { "type": "number", "description": "第二个数字" },
    "operation": {
      "type": "string",
      "enum": ["add", "subtract", "multiply", "divide"],
      "description": "要执行的操作"
    }
  },
  "required": ["a", "b", "operation"]
}

LLM 看到这份 JSON Schema，就知道调用 calculator 时需要传 a、b（数字）和 operation（枚举字符串）。.describe() 里的描述是 LLM 理解参数含义的关键------不写描述，LLM 就只能猜。

3.4 无参数工具：获取当前时间

不是所有工具都需要参数。获取当前时间就是一个典型：

typescript 复制代码

export const getCurrentTimeTool = tool(
  async () => {
    const now = new Date();
    return `当前时间不好说：${now.toLocaleString('zh-CN', { timeZone: 'Asia/Shanghai' })}`;
  },
  {
    name: 'get_current_time',
    description: '获取当前的系统时间（北京时间）',
    schema: z.object({}),  // 空 schema → LLM 知道不用传参
  }
);

z.object({}) 即空对象 schema，LLM 看到它就知道调用时不用传任何参数。

4. 创建 Agent：把大脑和手连起来

代码在 src/agents.ts 中。

4.1 配置 ChatModel

typescript 复制代码

import { ChatAnthropic } from '@langchain/anthropic';

const model = new ChatAnthropic({
  model: process.env.MODEL_NAME || 'qwen3.7-plus',
  anthropicApiKey: process.env.ANTHROPIC_API_KEY,
  anthropicApiUrl: process.env.ANTHROPIC_BASE_URL,
  streaming: true,
  maxTokens: 10000,
  thinking: {
    type: 'enabled',
    budget_tokens: 5000,
  },
});

ChatAnthropic 是 LangChain 的 ChatModel 实现之一。ChatModel 是 LangChain 对"对话模型"的统一抽象，提供两个核心方法：

.invoke(messages) --- 同步调用，等完整回复
.stream(messages) --- 流式调用，逐 token 返回

这里有两个值得注意的配置：

streaming: true --- 开启模型层的流式输出，后面会详细讲
thinking --- 开启 Extended Thinking，让模型在回复前先进行内部推理，最后一节会展开

4.2 创建 Agent 实例

typescript 复制代码

import { createDeepAgent } from 'deepagents';
import { MemorySaver } from '@langchain/langgraph';

export const agent = createDeepAgent({
  model,
  tools: [calculatorTool, getCurrentTimeTool],
  systemPrompt: '你是一个乐于助人的 AI 助手。当用户需要进行数学计算或查询时间时，请使用相应的工具来完成。',
  checkpointer: new MemorySaver(),
});

createDeepAgent 创建了一个 LangGraph 图结构的 Agent，内部流程：

md 复制代码

[用户消息] → [LLM] → 要调工具？
                        ├─ Yes → [执行工具] → 回到 LLM
                        └─ No  → [返回最终回复]

四个参数的含义：

参数	说明
`model`	ChatModel 实例，Agent 的"大脑"
`tools`	工具数组，Agent 自主选择调用哪个
`systemPrompt`	系统提示词，定义 Agent 的角色
`checkpointer`	记忆存储器，`MemorySaver` 是内存版（重启丢失），生产环境可换 `PostgresSaver`

5. 流式输出：别让用户干等着

代码在 src/index.ts 中。

5.1 invoke vs stream

LangChain Agent 提供两种调用方式：

agent.invoke() --- 等 Agent 完成所有工具调用后才返回完整结果（阻塞式）
agent.stream() --- 每产生一条消息就立刻 yield 出来（流式）

对于一个需要调用工具的 Agent，invoke 可能要等好几秒才有输出。而 stream 能让用户实时看到 AI 在"打字"，体验完全不同。

5.2 流式消息的类型

使用 stream() 配合 streamMode: 'messages'，每次 yield 的是 [message, metadata] 元组：

typescript 复制代码

const stream = await agent.stream(
  { messages: [{ role: 'user', content: '帮我计算 128 乘以 47 等于多少' }] },
  { ...config, streamMode: 'messages' },
);

for await (const [message] of stream) {
  console.log(message);  // 会看到各种类型的消息
}

message 是 LangChain 的消息对象，通过 message._getType() 判断类型：

类型	含义
`'ai'`	LLM 的输出（可能包含文本 + tool_call）
`'tool'`	工具执行后的返回结果
`'human'`	用户消息（stream 中一般不出现）

5.3 content 的两种形态

AI 消息的 content 字段有两种形态，这是个容易踩坑的地方：

形态一：字符串（没有调用工具时的纯文本回复）

typescript 复制代码

message.content === "128 × 47 = 6016"

形态二：数组（调用了工具时，包含多种 block）

typescript 复制代码

message.content === [
  { type: 'text', text: '计算结果是...' },
  { type: 'tool_use', id: '...', name: 'calculator', input: {...} },
]

所以处理流式消息时，两种情况都要兼顾：

typescript 复制代码

async function printStream(stream: AsyncIterable<[any, any]>) {
  for await (const [message] of stream) {
    // 只处理 AI 消息，跳过 tool / human
    if (message?._getType?.() === 'ai') {
      // 情况 1：content 是字符串
      if (typeof message.content === 'string' && message.content) {
        process.stdout.write(message.content);
      }
      // 情况 2：content 是数组，遍历找 text block
      else if (Array.isArray(message.content)) {
        for (const block of message.content) {
          if (block.type === 'text' && block.text) {
            process.stdout.write(block.text);
          }
        }
      }
    }
  }
}

踩坑记录 ：最初我只处理了 string 类型的 content，结果调用工具时输出为空------因为调用工具时 content 变成了数组。加上数组遍历后才正常工作。

6. 多轮对话：让 Agent 记住上下文

6.1 thread_id 与记忆

普通 LLM 每次调用都是独立的，它不记得你上一句说了什么。Agent 通过 checkpointer（记忆存储器）解决这个问题。

typescript 复制代码

const config = { configurable: { thread_id: 'session-1' } };

MemorySaver 按 thread_id 存储对话历史。同一个 thread_id 的所有消息会被累积存储，Agent 每次调用时都能看到之前的完整对话。

6.2 实际效果

typescript 复制代码

// 第一轮：计算器
await agent.stream(
  { messages: [{ role: 'user', content: '帮我计算 128 乘以 47 等于多少' }] },
  { ...config, streamMode: 'messages' },
);
// Agent 回复："128 × 47 = 6016"

// 第二轮：故意质疑
await agent.stream(
  { messages: [{ role: 'user', content: '算的不对吧' }] },
  { ...config, streamMode: 'messages' },
);
// Agent 会回顾之前的计算，重新审视结果
// 因为它"记得"上一轮自己算了什么

第二轮的"算的不对吧"没有提供任何数字信息，但 Agent 能理解这是在质疑上一轮的计算结果，这就是 thread_id + MemorySaver 的效果。

生产环境中，MemorySaver 只在进程内存中存储，重启就丢了。如果需要持久化记忆，可以换成 PostgresSaver 等存储后端。

7. Extended Thinking：看见模型的思考过程

这是最后加上的一个进阶功能------让模型在给出最终答案之前，先展示它的内部推理过程。

7.1 什么是 Extended Thinking？

Extended Thinking 是 Anthropic 提供的一个能力：模型在生成最终回复之前，会先进行一段"内心独白"（thinking），展示它是怎么一步步推理的。

对于用户来说，这就像一个"透明窗口"------你能看到 AI 在"想什么"，而不只是看到最终答案。

7.2 开启 Thinking

在 ChatAnthropic 配置中加入 thinking 参数：

typescript 复制代码

const model = new ChatAnthropic({
  model: 'qwen3.7-plus',
  // ...
  maxTokens: 10000,       // thinking 模式下必须显式设置
  thinking: {
    type: 'enabled',
    budget_tokens: 5000,  // thinking 阶段最多消耗 5000 token
  },
});

注意：开启 thinking 后，maxTokens 必须显式设置，这是 Anthropic API 的硬性要求。

7.3 处理 thinking block

开启 thinking 后，AI 消息的 content 数组中会多出一种 block 类型：

typescript 复制代码

message.content === [
  { type: 'thinking', thinking: '用户让我算 128 × 47，我需要用计算器工具...' },
  { type: 'tool_use', ... },
  { type: 'text', text: '128 × 47 = 6016' },
]

在 printStream 中新增对 thinking block 的处理：

typescript 复制代码

for (const block of message.content) {
  // thinking block：模型的内部推理（用灰色显示）
  if (block.type === 'thinking' && block.thinking) {
    process.stdout.write(`\x1b[90m[思考] ${block.thinking}\x1b[0m`);
  }
  // text block：模型的最终文本回复
  if (block.type === 'text' && block.text) {
    process.stdout.write(block.text);
  }
}

\x1b[90m 是 ANSI 转义码，让 thinking 内容以灰色显示，和最终的文本回复在视觉上区分开。

7.4 运行效果

ini 复制代码

--- 测试 1：计算器工具 ---
用户：帮我计算 128 乘以 47 等于多少
助手：[思考] 用户想要计算 128 乘以 47，这是一个乘法运算，我应该使用 calculator 工具...
128 × 47 = 6016

灰色部分是模型的推理过程，正常颜色是最终答案。

兼容性说明 ：如果底层模型不支持 Extended Thinking（比如某些模型走 Anthropic 兼容接口时），thinking block 不会出现，输出行为和之前完全一致，不会报错。

8. 回顾与展望

我们做了什么

从一个空文件夹开始，一步步搭建了：

TypeScript 项目骨架 --- pnpm init + tsconfig.json + tsx 开发环境
两个工具 --- 计算器（有参数）和获取时间（无参数），理解了 Zod schema 的作用
一个 Agent --- 用 createDeepAgent 把 LLM + 工具 + 循环串起来
流式输出 --- agent.stream() + streamMode: 'messages'，踩过了 content 类型的坑
多轮对话 --- thread_id + MemorySaver 实现上下文记忆
Extended Thinking --- 让模型的推理过程可视化

完整运行

bash 复制代码

pnpm dev

输出三个测试场景：计算器工具调用、多轮对话上下文记忆、时间工具调用。

后续可以做什么

主线

让 agent 能读写文件、执行代码、操作文件系统。

支线

加入更多工具（搜索、数据库查询、API 调用）
把 MemorySaver 换成持久化存储
接入真实的 Anthropic Claude 模型体验原生 thinking
给 Agent 加错误处理和重试机制
构建一个 Web 界面，把流式输出通过 SSE 推到前端

Agent 的世界刚刚打开，这个项目只是一个起点。

复制代码

从零开始搭建一个 AI Agent —— LangChain + TypeScript 实战手记