从0到1构建一个Agent智能体

引言

如今Agent已经众所周知了，更何况今天被称为 Agent 元年，那如何构建一个从零构建一个智能体，现在我们就来一步一步实现吧。

前置准备

API KEY（OpenAI，Cluade， AzureOpenAI等等）

构建Agent

初始化项目

bash 复制代码

mkdir hello-agent
pnpm init

pnpm add dotenv openai typescript
pnpm add -D ts-node @types/node ts-node

tsconfig.json配置

json 复制代码

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "commonjs",
    "strict": true,
    "esModuleInterop": true,
    "forceConsistentCasingInFileNames": true,
    "skipLibCheck": true,
    "outDir": "dist",
    "rootDir": "src",
    "moduleResolution": "node",
    "resolveJsonModule": true,
    "noImplicitAny": true,
    "noUnusedLocals": true,
    "noUnusedParameters": true,
    "noFallthroughCasesInSwitch": true
  },
  "include": ["src"],
  "exclude": ["node_modules", "dist"]
}

启动脚本配置

json 复制代码

"scripts": {
    "build": "tsc",
    "start": "node dist/main.js",
    "dev": "tsc --watch",
    "tsnode": "ts-node src/main.ts",
}

流式输出

tsx 复制代码

import dotenv from "dotenv";
import { AzureOpenAI } from "openai";

dotenv.config();

const apiKey = process.env.AZURE_OPENAI_API_KEY;
const endpoint = process.env.AZURE_OPENAI_ENDPOINT;
const deployment = process.env.AZURE_OPENAI_DEPLOYMENT;
const apiVersion = process.env.AZURE_OPENAI_API_VERSION;

if (!apiKey || !endpoint || !deployment || !apiVersion) {
  throw new Error("Missing required environment variables");
}

const openai = new AzureOpenAI({ endpoint, apiKey, apiVersion, deployment });
// const openai = new OpenAI({
//   apiKey: process.env.OPENAI_API_KEY,
// });

async function getChatResponseWithStream(prompt: string): Promise<void> {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o-mini-2024-07-18",
    messages: [{ role: "system", content: '用中文回答' }, { role: "user", content: prompt }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices?.[0]?.delta?.content;
    if (content) {
      process.stdout.write(content);
    }
  }
}

const prompt = "这个文件主要讲的是什么?"
getChatResponseWithStream(prompt).catch((error) => {
  console.error("Error fetching chat response:", error);
  process.exit(1);
});

这个我们既实现了一个在终端的流式输出。那么我想要真的想要读取这个文件该这么办嘞，有一个简单粗暴的方法就是我们直接读完这个文件，将其传入进去就好了。

tsx 复制代码

const file = readFileSync("src/main.ts", "utf-8");

const prompt = `${file}, 主要讲的是什么?`;
getChatResponseWithStream(prompt).catch((error) => {
  console.error("Error fetching chat response:", error);
  process.exit(1);
});

效果如下

总结文件

虽然之前传入一个文件能够实现，但是我们还是想要用大模型的能力去实现，这就是利用函数调用的功能。

简单来说就是，大模型的函数调用功能就是为了提取用户输入的意图，根据这个意图去调用对应的函数，构建一个Agent应用就是主要使用了函数调用的能力，但是注意Agent远不止于此，比如说Agent应用会调用几个函数，处理更加复杂的任务，也存在记忆能力等等，总的来说就是，函数调用是Agent的其中一环。

画一个图来深入理解一下

接下来我们需要重构以上代码，使其支持以下三个功能

连续对话

使用终端的输入，使其不中断

lua 复制代码

const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

存储会话记录

使用一个messages的数组去存储消息
函数调用

使用大模型的的能力去识别用户意图，调用函数

读取文件，并返回文件内容

tsx 复制代码

async function readFile(name: string, args: any): Promise<string> {
  if (name === "read_file") {
    try {
      const content = readFileSync(args.path, "utf-8");
      return content;
    } catch (e) {
      return `读取文件失败: ${e}`;
    }
  }
  return "不支持的函数调用";
}

连续对话，并实现函数调用功能

tsx 复制代码

async function chatLoop() {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });
  const messages: ChatCompletionMessageParam[] = [
    {
      role: "system",
      content: "用中文回答问题。",
    },
  ];

  async function getChatResponseWithStream(
    message: ChatCompletionMessageParam[]
  ): Promise<void> {
    const stream = await openai.chat.completions.create({
      model: "gpt-4o-mini-2024-07-18",
      messages: message,
      stream: true,
    });

    let msg = "";
    process.stdout.write("AI: ");
    for await (const chunk of stream) {
      const content = chunk.choices?.[0]?.delta?.content;
      if (content) {
        process.stdout.write(content);
        msg += content;
      }
    }

    messages.push({ role: "assistant", content: msg });
    process.stdout.write("\n");
  }
  async function ask() {
    rl.question("You: ", async (input) => {
      messages.push({ role: "user", content: input });
      const stream = await openai.chat.completions.create({
        model: "gpt-4o-mini-2024-07-18",
        messages: messages,
        tools: [
          {
            type: "function",
            function: {
              name: "read_file",
              description: "读取指定文件内容",
              parameters: {
                type: "object",
                properties: {
                  path: {
                    type: "string",
                    description: "要读取的文件路径",
                  },
                },
                required: ["path"],
              },
            },
          },
        ],
        tool_choice: "auto",
      });
      const content = stream.choices?.[0]?.message?.content;
      const functionCall = stream.choices?.[0]?.message?.tool_calls;
      if (functionCall && functionCall.length > 0) {
        for (const fnCall of functionCall) {
          messages.push(stream.choices?.[0]?.message); // 插入 tool_calls 消息
          const fnName = fnCall.function.name;
          if (fnName === "read_file") {
            const args = JSON.parse(fnCall.function.arguments || "{}");
            const result = await readFile(fnCall.function.name, args);
            messages.push({
              role: "tool",
              tool_call_id: fnCall.id,
              content: result,
            });
          }

          process.stdout.write(`\n[函数调用了工具]: ${fnCall.function.name}\n`);
        }
        await getChatResponseWithStream(messages);
      } else if (content) {
        // 没有 tool_calls，直接插入 assistant 消息
        process.stdout.write("\nAI: ");
        process.stdout.write(content + "\n");
        messages.push({ role: "assistant", content });
      }
      ask();
    });
  }
  ask();
}
chatLoop();

实现函数调用，就是主要在tools里面去定义不同的函数，用户输入消息后自动选择需要的函数，然后进行不同的逻辑处理。

效果如下：

保存总结消息

上面我们已经实现了如何读取文件并总结文件的功能，现在我想要去实现将总结的内容保持，这又该怎么做嘞？

按照上面的方法举一反三，那就是与三步走

写入函数

tsx 复制代码

async function writeFile(args: { fName: string; content: string }) {
  try {
    writeFileSync(args.fName, args.content, "utf-8");
    return `已将内容写入文件：${args.fName}`;
  } catch (e) {
    return `读取或写入文件失败: ${e}`;
  }
}

添加tool工具

tsx 复制代码

{
  type: "function",
  function: {
  name: "write_file",
  description: "将内容输入到文件中",
  parameters: {
    type: "object",
    properties: {
      content: {
      type: "string",
      description: "需要写入的文件内容",
    },
    fName: {
    type: "string",
    description: "要写入的文件名",
  },
  },
  required: ["content", "fName"],
},
},
},

判断调用了哪个工具，自己处理逻辑

tsx 复制代码

if (fnName === "write_file") {
  const args = JSON.parse(fnCall.function.arguments || "{}");
  const result = await writeFile(args);
  messages.push({
    role: "tool",
    tool_call_id: fnCall.id,
    content: result,
  });
}

效果如下

GitHub地址 github.com/krismile-su...

总结

我们从零到一，只用了不到150行的代码就构建了一个读取文件，总结，并保存的Agent，也推荐大家动手去实践实践吧。