智能体Agent从 0 到 OpenClaw：AI Agent 的完整演进之路

一、核心演进路线总览

表格

阶段	名称	核心组件	关键能力	代码规模
1	硬编码响应器	无（基础交互）	固定输入→固定输出	20 行 +
2	工具调用层	标准化工具接口	可扩展工具调用	50 行 +
3	上下文记忆	短期记忆系统	连续对话能力	80 行 +
4	LLM 决策驱动	推理引擎	自主判断工具 / 回答	120 行 +
5	多模态感知	输入归一化层	统一处理文本 / 命令 / 图片	150 行 +
6	OpenClaw 完整架构	Agent Loop + 网关 + 执行层	生产级稳定性 + 多会话 + 插件化	200 行 +

二、从 0 到 1：逐步构建 OpenClaw

阶段 1：硬编码响应器（最简单的 "伪 Agent"）

目标：实现最基础的「输入→输出」闭环，无智能，仅演示交互流程。

typescript 复制代码

// 极简硬编码Agent
class HardcodedAgent {
  run(input: string): string {
    if (input.includes('你好')) return '你好！我是硬编码小助手';
    if (input.includes('计算') && input.includes('+')) {
      const [a, b] = input.match(/\d+/g)?.map(Number) || [0, 0];
      return `结果：${a + b}`;
    }
    return '抱歉，我只会打招呼和简单加法';
  }
}

// 测试
const agent = new HardcodedAgent();
console.log(agent.run('你好')); // 输出：你好！我是硬编码小助手
console.log(agent.run('计算 10+20')); // 输出：结果：30

核心问题：

所有逻辑硬编码，新增功能需修改源码；
无记忆能力，无法处理连续对话（比如先问 "10+20" 再问 "乘以 3" 会失败）；
无工具扩展机制，新增能力需重写判断逻辑。

阶段 2：工具调用层（OpenClaw 插件化基础）

目标：引入「标准化工具接口」，解决硬编码扩展性问题。

架构图

graph LR A[用户输入] --> B{指令匹配} B -->|/calculator| C[计算器工具] B -->|/weather| D[天气工具] C --> E[返回执行结果] D --> E

核心代码

typescript 复制代码

// 1. 工具标准化接口（OpenClaw核心设计）
    interface Tool {
      name: string;          // 工具唯一标识
      description: string;   // 工具功能描述
      run(args: string): Promise<string>; // 异步执行方法
    }

    // 2. 具体工具实现：计算器
    class CalculatorTool implements Tool {
      name = 'calculator';
      description = '计算加减乘除，格式：数字 运算符 数字（如 10 + 20）';
      async run(args: string): Promise<string> {
        const [aStr, op, bStr] = args.split(' ');
        const a = parseFloat(aStr), b = parseFloat(bStr);
        let result: number;
        switch (op) {
          case '+': result = a + b; break;
          case '-': result = a - b; break;
          case '*': result = a * b; break;
          case '/': result = b === 0 ? NaN : a / b; break;
          default: return `不支持的运算符：${op}`;
        }
        return `计算结果：${isNaN(result) ? '错误' : result}`;
      }
    }

    // 3. 基础工具Agent
    class ToolAgent {
      private tools: Map<string, Tool>;
      constructor() {
        this.tools = new Map([
          [new CalculatorTool().name, new CalculatorTool()]
        ]);
      }

      async run(input: string): Promise<string> {
        if (input.startsWith('/calculator')) {
          const args = input.slice('/calculator'.length).trim();
          return this.tools.get('calculator')!.run(args);
        }
        return '请使用 /calculator 指令';
      }
    }

    // 测试
    async function test() {
      const agent = new ToolAgent();
      console.log(await agent.run('/calculator 100 * 2.5')); // 输出：计算结果：250
    }
    test();

关键改进：

通过Tool接口强制标准化，新增工具只需实现接口，无需修改 Agent 核心逻辑；
Map映射替代硬编码if-else，工具查找更高效、可维护；
异步优先设计，适配 API 调用、文件读写等异步场景（OpenClaw 原生支持异步工具）。

阶段 3：上下文记忆（连续对话的基石）

目标：添加「短期记忆组件」，让 Agent 记住历史对话，支持连续交互。

架构图

graph LR A[用户输入] --> B[感知解析] B --> C[短期记忆] C --> D[执行逻辑] D --> E[输出结果] E --> C %% 结果回写记忆，形成闭环

核心代码（记忆层实现）

typescript 复制代码

 // 短期记忆系统（滑动窗口机制）
    class ShortTermMemory {
      private context: Array<{ role: 'user' | 'agent'; content: string }> = [];
      private maxLen = 5; // 限制记忆长度，避免性能下降

      add(role: 'user' | 'agent', content: string): void {
        this.context.push({ role, content });
        if (this.context.length > this.maxLen) this.context.shift(); // 移除最早记录
      }

      getContext(): string {
        return this.context.map(item => `${item.role}: ${item.content}`).join('\n');
      }
    }

    // 增强Agent（集成记忆）
    class MemoryAgent {
      private tools: Map<string, Tool>;
      private memory: ShortTermMemory;

      constructor() {
        this.tools = new Map([[new CalculatorTool().name, new CalculatorTool()]]);
        this.memory = new ShortTermMemory();
      }

      async run(input: string): Promise<string> {
        // 记录用户输入到记忆
        this.memory.add('user', input);
        
        // 工具调用逻辑
        let result: string;
        if (input.startsWith('/calculator')) {
          const args = input.slice('/calculator'.length).trim();
          result = await this.tools.get('calculator')!.run(args);
        } else if (input.includes('明天') && this.memory.getContext().includes('天气')) {
          // 上下文匹配：自动关联上一次天气查询的城市
          const city = this.memory.getContext().match(/【(\w+)】/)?.[1] || '北京';
          result = `【${city}】明天天气：多云，22℃`;
        } else {
          result = '请使用 /calculator 指令或询问天气相关问题';
        }

        // 记录Agent输出到记忆
        this.memory.add('agent', result);
        return result;
      }
    }

    // 测试：连续对话
    async function test() {
      const agent = new MemoryAgent();
      console.log(await agent.run('/weather 广州')); // 输出：【广州】天气：晴...
      console.log(await agent.run('明天呢')); // 输出：【广州】明天天气：多云...
    }
    test();

核心价值：

滑动窗口机制：maxLen限制记忆长度，避免上下文过长导致性能下降；
双向存储：用户输入和 Agent 输出都存入记忆，形成完整对话链；
上下文匹配：通过getContext()检索历史记录，实现 "模糊查询→精准响应"。

阶段 4：LLM 决策驱动（智能的核心）

目标：用「LLM 推理」替代硬编码决策，让 Agent 自主判断「直接回答」还是「调用工具」，实现真正的智能。

决策流程

graph TD A[用户输入] --> B[组装上下文记忆] B --> C[LLM推理引擎] C --> D{执行计划} D -->|tool| E[调用工具] D -->|answer| F[直接回答] E --> G[结果反馈] F --> G G --> B

核心代码（LLM 推理层）

typescript 复制代码

import axios from 'axios';

    // LLM决策引擎（对接本地Ollama）
    class LLMReasoning {
      private tools: Tool[];
      private llmUrl = 'http://localhost:11434'; // 本地Ollama地址

      constructor(tools: Tool[]) {
        this.tools = tools;
      }

      // 生成工具描述，供LLM理解可用能力
      private getToolsDesc(): string {
        return this.tools.map(t => `- ${t.name}: ${t.description}`).join('\n');
      }

      // 核心：生成结构化执行计划
      async plan(input: string, memory: ShortTermMemory): Promise<{
        action: 'answer' | 'tool';
        content: string | { toolName: string; toolArgs: string };
      }> {
        const prompt = `
          你是智能Agent，拥有以下工具：
          ${this.getToolsDesc()}
          上下文：${memory.getContext()}
          当前输入：${input}
          
          请严格返回JSON，无需其他内容：
          1. 调用工具：{"action":"tool","content":{"toolName":"工具名","toolArgs":"参数"}}
          2. 直接回答：{"action":"answer","content":"回答内容"}
        `.trim();

        // 调用本地LLM模型（如phi3:mini）
        const response = await axios.post(`${this.llmUrl}/api/generate`, {
          model: 'phi3:mini',
          prompt,
          format: 'json',
          stream: false
        });
        return JSON.parse(response.data.response);
      }
    }

    // 增强Agent：集成LLM决策
    class LLMAgent {
      private tools: Map<string, Tool>;
      private memory: ShortTermMemory;
      private reasoning: LLMReasoning;

      constructor() {
        this.tools = new Map([[new CalculatorTool().name, new CalculatorTool()]]);
        this.memory = new ShortTermMemory();
        this.reasoning = new LLMReasoning(Array.from(this.tools.values()));
      }

      async run(input: string): Promise<string> {
        this.memory.add('user', input);
        // LLM决策：生成执行计划
        const plan = await this.reasoning.plan(input, this.memory);
        // 执行计划
        let result: string;
        if (plan.action === 'tool') {
          const { toolName, toolArgs } = plan.content as { toolName: string; toolArgs: string };
          result = await this.tools.get(toolName)!.run(toolArgs);
        } else {
          result = plan.content as string;
        }
        this.memory.add('agent', result);
        return result;
      }
    }

    // 测试：复杂指令（LLM自动拆解）
    async function test() {
      const agent = new LLMAgent();
      console.log(await agent.run('帮我计算100+200，再查上海明天天气'));
    }
    test();

关键突破：

LLM 通过工具描述自主选择最佳执行路径，无需硬编码判断逻辑；
JSON 结构化输出确保解析稳定性，避免格式混乱；
记忆上下文让决策更精准，支持复杂多步任务。

阶段 5：多模态感知（输入归一化）

目标：添加「感知层」，统一处理多种输入类型（文本 / 命令 / 图片），为多模态能力打下基础。

感知层架构

graph LR A[原始输入] --> B[感知层] B --> C[文本输入] B --> D[命令输入] B --> E[图片输入] C --> F[统一结构化输出] D --> F E --> F F --> G[决策层]

核心代码（感知层实现）

typescript 复制代码

 // 多模态感知层（输入归一化）
    class Perception {
      static parseInput(input: any): { type: 'text' | 'command' | 'image'; content: string } {
        if (typeof input === 'string') {
          // 区分普通文本和命令
          return input.startsWith('/') 
            ? { type: 'command', content: input.slice(1).trim() }
            : { type: 'text', content: input.trim() };
        }
        if (input instanceof Buffer || input.type === 'image') {
          // 模拟图片OCR识别（实际项目替换为真实OCR）
          return { type: 'image', content: '图片内容：天气截图' };
        }
        throw new Error(`不支持的输入类型：${typeof input}`);
      }
    }

    // 增强Agent：集成感知层
    class PerceptionAgent {
      private perception: Perception;
      private tools: Map<string, Tool>;
      private memory: ShortTermMemory;
      private reasoning: LLMReasoning;

      constructor() {
        this.perception = new Perception();
        this.tools = new Map([[new CalculatorTool().name, new CalculatorTool()]]);
        this.memory = new ShortTermMemory();
        this.reasoning = new LLMReasoning(Array.from(this.tools.values()));
      }

      async run(input: any): Promise<string> {
        // 统一解析输入
        const parsedInput = this.perception.parseInput(input);
        this.memory.add('user', `${parsedInput.type}: ${parsedInput.content}`);
        // 决策+执行
        const plan = await this.reasoning.plan(parsedInput.content, this.memory);
        let result: string;
        if (plan.action === 'tool') {
          const { toolName, toolArgs } = plan.content as { toolName: string; toolArgs: string };
          result = await this.tools.get(toolName)!.run(toolArgs);
        } else {
          result = plan.content as string;
        }
        this.memory.add('agent', result);
        return result;
      }
    }

    // 测试：多类型输入
    async function test() {
      const agent = new PerceptionAgent();
      console.log(await agent.run('/calculator 500 / 25')); // 命令类型
      console.log(await agent.run('图片内容：上海天气截图')); // 模拟图片输入
    }
    test();

核心价值：

输入归一化：无论输入是文本、命令还是图片，都转为{type, content}标准格式；
多模态扩展：预留图片、音频等输入类型的处理接口；
错误防护：提前校验输入类型，避免非法输入导致崩溃。

阶段 6：OpenClaw 完整架构（生产级 Agent）

目标：整合所有组件，添加「Agent Loop」「网关层」和「执行层」，实现生产级特性。

OpenClaw 完整架构图

graph TD A[用户] --> B[网关层] B --> C[会话管理] B --> D[车道式队列] C --> E[记忆系统] D --> F[Agent Loop] E --> F F --> G[感知层] F --> H[决策层] F --> I[执行层] F --> J[记忆更新] G --> H H --> I I --> J J --> F I --> K[工具层] K --> L[外部服务/本地能力] L --> I

OpenClaw 核心：Agent Loop 实现

Agent Loop 是 OpenClaw 的灵魂，是「感知→决策→执行→记忆→再感知」的无限闭环，公式化表达：

plaintext

css 复制代码

Loop:
  Perceive(感知) → Reason(决策) → Act(执行) → Memorize(记忆) → Repeat(循环)

完整代码（OpenClaw Agent Loop）

typescript 复制代码

    // 1. 网关层：会话管理器
    class SessionManager {
      private sessions: Map<string, { memory: ShortTermMemory; lastActive: Date }> = new Map();

      getSession(sessionId: string): { memory: ShortTermMemory } {
        if (!this.sessions.has(sessionId)) {
          this.sessions.set(sessionId, {
            memory: new ShortTermMemory(),
            lastActive: new Date()
          });
        }
        this.sessions.get(sessionId)!.lastActive = new Date();
        return this.sessions.get(sessionId)!;
      }
    }

    // 2. 网关层：车道式队列（同会话串行执行）
    class LaneQueue {
      private queues: Map<string, Array<() => Promise<void>>> = new Map();

      async enqueue(sessionId: string, task: () => Promise<void>): Promise<void> {
        if (!this.queues.has(sessionId)) this.queues.set(sessionId, []);
        const queue = this.queues.get(sessionId)!;
        queue.push(task);
        if (queue.length === 1) await this.processQueue(sessionId);
      }

      private async processQueue(sessionId: string): Promise<void> {
        const queue = this.queues.get(sessionId)!;
        while (queue.length > 0) {
          const task = queue[0];
          try {
            await task();
          } catch (e) {
            console.error(`任务执行失败：${(e as Error).message}`);
          }
          queue.shift();
        }
      }
    }

    // 3. 执行层（统一执行逻辑）
    class ActionExecutor {
      private tools: Map<string, Tool>;

      constructor(tools: Map<string, Tool>) {
        this.tools = tools;
      }

      async execute(plan: any): Promise<string> {
        if (plan.action === 'tool') {
          const { toolName, toolArgs } = plan.content;
          const tool = this.tools.get(toolName);
          if (!tool) return `未知工具：${toolName}`;
          // 容错重试
          let attempt = 0;
          while (attempt < 2) {
            try {
              return await tool.run(toolArgs);
            } catch (e) {
              attempt++;
              if (attempt === 2) return `工具执行失败：${(e as Error).message}`;
            }
          }
        }
        return plan.content as string;
      }
    }

    // 4. OpenClaw完整Agent
    class OpenClawAgent {
      private perception: Perception;
      private sessionManager: SessionManager;
      private laneQueue: LaneQueue;
      private tools: Map<string, Tool>;
      private reasoning: LLMReasoning;
      private executor: ActionExecutor;
      private maxIterations = 5; // 最大循环次数
      private timeout = 30000;   // 超时时间

      constructor() {
        this.perception = new Perception();
        this.sessionManager = new SessionManager();
        this.laneQueue = new LaneQueue();
        this.tools = new Map([[new CalculatorTool().name, new CalculatorTool()]]);
        this.reasoning = new LLMReasoning(Array.from(this.tools.values()));
        this.executor = new ActionExecutor(this.tools);
      }

      // Agent Loop核心入口
      async run(sessionId: string, input: any): Promise<{ result: string; status: string }> {
        return new Promise((resolve) => {
          // 网关层：任务入队
          this.laneQueue.enqueue(sessionId, async () => {
            const session = this.sessionManager.getSession(sessionId);
            let currentInput = input;
            let iteration = 0;
            let finalResult = '';
            let loopStatus = 'running';
            const startTime = Date.now();

            // ====================== Agent Loop核心循环 ======================
            while (loopStatus === 'running') {
              // 终止条件1：达到最大迭代次数
              if (iteration >= this.maxIterations) {
                loopStatus = 'finished';
                finalResult = `达到最大迭代次数(${this.maxIterations})`;
                break;
              }

              // 终止条件2：超时
              if (Date.now() - startTime > this.timeout) {
                loopStatus = 'timeout';
                finalResult = `任务超时(${this.timeout}ms)`;
                break;
              }

              iteration++;
              // Step1: 感知
              const parsedInput = this.perception.parseInput(currentInput);
              session.memory.add('user', `迭代${iteration}输入：${parsedInput.content}`);

              // Step2: 决策
              const plan = await this.reasoning.plan(parsedInput.content, session.memory);
              if (plan.action === 'finish') {
                loopStatus = 'finished';
                finalResult = plan.content as string;
                break;
              }

              // Step3: 执行
              const actionResult = await this.executor.execute(plan);
              session.memory.add('agent', `迭代${iteration}输出：${actionResult}`);

              // Step4: 反馈
              currentInput = `基于上一轮结果：${actionResult}，继续完成原任务`;
            }

            resolve({ result: finalResult, status: loopStatus });
          });
        });
      }

      // 插件化扩展：动态添加工具
      addTool(tool: Tool): void {
        this.tools.set(tool.name, tool);
        this.reasoning = new LLMReasoning(Array.from(this.tools.values()));
      }
    }

    // 测试：多步任务
    async function testLoop() {
      const agent = new OpenClawAgent();
      const result = await agent.run('session1', '帮我计算10加20，然后把结果乘以3');
      console.log('最终结果：', result.result);
      console.log('Loop状态：', result.status);
    }
    testLoop();

OpenClaw 核心组件详解

表格

组件	作用	生产级特性
网关层	会话管理 + 任务排队	车道式队列（同会话串行、跨会话并行），防止并发冲突
Agent Loop	核心执行循环	多轮迭代 + 三重终止条件（最大迭代、超时、手动终止）
感知层	输入归一化	统一处理文本 / 命令 / 图片，支持多模态扩展
决策层	LLM 推理 + 工具选择	支持 ReAct 推理，返回`finish`动作终止循环
执行层	工具执行 + 结果处理	容错重试、超时控制、工具隔离
记忆层	上下文存储	短期记忆 + 长期记忆（向量数据库），支撑复杂对话
工具层	标准化功能扩展	插件化架构，动态添加 / 移除工具

OpenClaw Agent Loop 核心流程图

flowchart TD A[启动任务] --> B{Loop是否运行?} B -->|是| C[检查终止条件] C -->|超迭代次数| D[任务结束] C -->|超时| E[超时终止] C -->|正常| F[感知输入] F --> G[LLM决策] G --> H{任务完成?} H -->|是| D H -->|否| I[执行动作] I --> J[写入记忆] J --> K[更新下轮输入] K --> B D --> L[输出最终结果] E --> L

三、核心设计哲学总结

模块化解耦：五层架构（感知 / 记忆 / 决策 / 执行 / 网关）独立，任意一层可替换（如规则决策→LLM 决策、文本感知→语音感知）；
闭环复用：「输入→感知→记忆→决策→执行→记忆→输出」的闭环是 OpenClaw 自主完成任务的基础；
生产级鲁棒性：三重终止条件 + 容错重试 + 会话隔离，防止无限循环和系统崩溃；
LLM 驱动：决策完全由 LLM 和 Prompt 控制，无需硬编码复杂逻辑，适配快速迭代；
插件化扩展：工具层标准化接口，支持动态添加功能，满足多样化需求。

四、OpenClaw 官方扩展方向

长期记忆：集成向量数据库（如 Pinecone）实现长期记忆检索；
多 Agent 协作：通过 Supervisor-Worker 模式实现任务拆解与分发；
安全沙箱：为工具执行提供隔离环境，防止恶意代码执行；
可视化监控：Web 界面展示 Loop 迭代过程、工具调用记录、记忆内容。