多模型支持的架构设计：如何集成 10+ AI 模型

blade-code 系列第 3 篇。从统一接口到模型切换，从成本优化到降级策略，聊聊怎么构建一个灵活的多模型 AI 系统。

为什么要支持多模型？

构建 AI 应用时，第一个问题往往是："用哪个模型？"

OpenAI GPT-5.2？强大但贵
Claude Opus 4.6？代码能力强但有速率限制
DeepSeek V3.2？便宜但稳定性存疑
Gemini 2.5？免费额度大但有配额

我的答案是：全都要。

blade-code 从第一天就设计为多模型架构，支持 10+ 主流模型的无缝切换。用户可以根据任务类型、成本预算、速率限制灵活选择，甚至运行时动态切换。

单一模型的问题

依赖单一模型会遇到这些麻烦：

问题	影响
成本高	GPT-5.2 pro 输出 $168/M tokens
速率限制	Claude 每分钟请求数有限
服务中断	OpenAI 宕机时干瞪眼
能力差异	不同任务需要不同模型
地域限制	某些模型在特定地区不可用

多模型架构能解决这些问题：

成本优化 --- 简单任务用便宜模型，复杂任务用强大模型：

typescript 复制代码

// 简单任务
const summary = await model.generate('总结这段文字', {
  model: 'deepseek-chat' // $0.28/M input, $0.42/M output
});

// 复杂任务
const architecture = await model.generate('设计系统架构', {
  model: 'claude-opus-4.6' // $5/M input, $25/M output
});

高可用 --- 主模型挂了自动切换：

typescript 复制代码

const response = await model.generate(prompt, {
  model: 'gpt-5.2',
  fallback: ['claude-sonnet-4.5', 'deepseek-chat']
});

任务匹配 --- 不同任务用最合适的模型：

typescript 复制代码

const taskModelMap = {
  'code-generation': 'claude-opus-4.6',
  'translation': 'gpt-5-mini',
  'reasoning': 'gpt-5.2-pro',
  'chat': 'deepseek-chat',
};

支持的模型（2026 年 2 月）

blade-code 目前支持这些模型：

提供商	模型	特点	成本 (输入/输出 per M)
OpenAI	gpt-5.2	旗舰，代码和 Agent 任务	<math xmlns="http://www.w3.org/1998/Math/MathML"> 1.75 / 1.75 / </math>1.75/14
	gpt-5.2-pro	最强推理	<math xmlns="http://www.w3.org/1998/Math/MathML"> 21 / 21 / </math>21/168
	gpt-5-mini	轻量快速	<math xmlns="http://www.w3.org/1998/Math/MathML"> 0.25 / 0.25 / </math>0.25/2
	gpt-4.1	可微调	<math xmlns="http://www.w3.org/1998/Math/MathML"> 3 / 3 / </math>3/12
	gpt-4.1-mini	微调轻量版	<math xmlns="http://www.w3.org/1998/Math/MathML"> 0.8 / 0.8 / </math>0.8/3.2
	o4-mini	推理模型	<math xmlns="http://www.w3.org/1998/Math/MathML"> 4 / 4 / </math>4/16
Anthropic	claude-opus-4.6	最强 Agent 和代码	<math xmlns="http://www.w3.org/1998/Math/MathML"> 5 / 5 / </math>5/25
	claude-sonnet-4.5	性价比最佳	<math xmlns="http://www.w3.org/1998/Math/MathML"> 3 / 3 / </math>3/15
	claude-haiku-4.5	快速响应	<math xmlns="http://www.w3.org/1998/Math/MathML"> 1 / 1 / </math>1/5
Google	gemini-2.5-pro	长上下文	<math xmlns="http://www.w3.org/1998/Math/MathML"> 1.25 / 1.25 / </math>1.25/5
	gemini-2.5-flash	免费实验版	免费
DeepSeek	deepseek-chat (V3.2)	性价比王	<math xmlns="http://www.w3.org/1998/Math/MathML"> 0.28 / 0.28 / </math>0.28/0.42
	deepseek-reasoner (V3.2)	推理模式	<math xmlns="http://www.w3.org/1998/Math/MathML"> 0.28 / 0.28 / </math>0.28/0.42
OpenRouter	聚合多家	统一接口	按模型计费

怎么选？

代码生成： claude-opus-4.6 最强，deepseek-chat 最便宜

深度推理： gpt-5.2-pro 最强，deepseek-reasoner 性价比高

日常对话： gpt-5-mini 或 deepseek-chat，便宜又快

长文本： gemini-2.5-pro 支持超长上下文

统一接口设计

架构概览

graph TB subgraph "ModelManager" MM[ModelManager] MM --> |管理| P1[OpenAI Provider] MM --> |管理| P2[Anthropic Provider] MM --> |管理| P3[Google Provider] MM --> |管理| P4[DeepSeek Provider] end subgraph "统一接口" PI[Provider Interface] PI --> |generate| GEN[生成响应] PI --> |stream| STR[流式响应] PI --> |isAvailable| CHK[检查可用性] PI --> |getModelInfo| INFO[获取模型信息] end P1 -.-> |实现| PI P2 -.-> |实现| PI P3 -.-> |实现| PI P4 -.-> |实现| PI style MM fill:#4A90D9,color:#fff style PI fill:#50C878,color:#fff

所有模型提供商都实现统一的 Provider 接口：

typescript 复制代码

interface Provider {
  name: string;
  models: string[];
  
  generate(request: GenerateRequest): Promise<GenerateResponse>;
  stream(request: GenerateRequest): AsyncIterable<StreamChunk>;
  isAvailable(model: string): Promise<boolean>;
  getModelInfo(model: string): ModelInfo;
}

interface GenerateRequest {
  model: string;
  messages: Message[];
  temperature?: number;
  maxTokens?: number;
  tools?: Tool[];
}

interface GenerateResponse {
  content: string;
  usage: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
  };
  finishReason: 'stop' | 'length' | 'tool_calls';
  toolCalls?: ToolCall[];
}

OpenAI Provider 实现

typescript 复制代码

class OpenAIProvider implements Provider {
  name = 'openai';
  models = ['gpt-5.2', 'gpt-5.2-pro', 'gpt-5-mini', 'gpt-4.1', 'o4-mini'];
  
  private client: OpenAI;

  constructor(apiKey: string) {
    this.client = new OpenAI({ apiKey });
  }

  async generate(request: GenerateRequest): Promise<GenerateResponse> {
    const response = await this.client.chat.completions.create({
      model: request.model,
      messages: request.messages,
      temperature: request.temperature,
      max_tokens: request.maxTokens,
      tools: request.tools,
    });

    return {
      content: response.choices[0].message.content || '',
      usage: {
        promptTokens: response.usage?.prompt_tokens || 0,
        completionTokens: response.usage?.completion_tokens || 0,
        totalTokens: response.usage?.total_tokens || 0,
      },
      finishReason: response.choices[0].finish_reason as any,
      toolCalls: response.choices[0].message.tool_calls,
    };
  }

  async *stream(request: GenerateRequest): AsyncIterable<StreamChunk> {
    const stream = await this.client.chat.completions.create({
      model: request.model,
      messages: request.messages,
      stream: true,
    });

    for await (const chunk of stream) {
      const delta = chunk.choices[0]?.delta;
      if (delta?.content) {
        yield { type: 'content', content: delta.content };
      }
      if (delta?.tool_calls) {
        yield { type: 'tool_calls', toolCalls: delta.tool_calls };
      }
    }
  }

  async isAvailable(model: string): Promise<boolean> {
    try {
      await this.client.models.retrieve(model);
      return true;
    } catch {
      return false;
    }
  }

  getModelInfo(model: string): ModelInfo {
    const infoMap: Record<string, ModelInfo> = {
      'gpt-5.2': {
        contextWindow: 256000,
        maxOutputTokens: 32768,
        costPer1MTokens: { input: 1.75, output: 14 },
        capabilities: ['text', 'vision', 'tools', 'agents'],
      },
      'gpt-5-mini': {
        contextWindow: 128000,
        maxOutputTokens: 16384,
        costPer1MTokens: { input: 0.25, output: 2 },
        capabilities: ['text', 'vision', 'tools'],
      },
    };
    return infoMap[model];
  }
}

Anthropic Provider 实现

Anthropic 的消息格式和 OpenAI 不同，需要做转换：

typescript 复制代码

class AnthropicProvider implements Provider {
  name = 'anthropic';
  models = [
    'claude-opus-4.6',
    'claude-sonnet-4.5',
    'claude-haiku-4.5',
  ];
  
  private client: Anthropic;

  constructor(apiKey: string) {
    this.client = new Anthropic({ apiKey });
  }

  async generate(request: GenerateRequest): Promise<GenerateResponse> {
    // 转换消息格式：OpenAI -> Anthropic
    const { system, messages } = this.convertMessages(request.messages);

    const response = await this.client.messages.create({
      model: request.model,
      system,
      messages,
      max_tokens: request.maxTokens || 8192,
      temperature: request.temperature,
      tools: request.tools,
    });

    return {
      content: this.extractContent(response.content),
      usage: {
        promptTokens: response.usage.input_tokens,
        completionTokens: response.usage.output_tokens,
        totalTokens: response.usage.input_tokens + response.usage.output_tokens,
      },
      finishReason: response.stop_reason as any,
      toolCalls: this.extractToolCalls(response.content),
    };
  }

  private convertMessages(messages: Message[]): {
    system: string;
    messages: Anthropic.MessageParam[];
  } {
    // 提取 system 消息
    const systemMessages = messages.filter(m => m.role === 'system');
    const system = systemMessages.map(m => m.content).join('\n\n');

    // 转换其他消息
    const anthropicMessages = messages
      .filter(m => m.role !== 'system')
      .map(m => ({
        role: m.role as 'user' | 'assistant',
        content: m.content,
      }));

    return { system, messages: anthropicMessages };
  }
}

模型切换与降级

运行时切换

用户可以随时切换模型：

typescript 复制代码

class ModelManager {
  private currentModel: string;
  private providers: Map<string, Provider>;

  switchModel(model: string): void {
    const provider = this.getProviderForModel(model);
    if (!provider) {
      throw new Error(`Model ${model} not supported`);
    }
    this.currentModel = model;
    this.logger.info(`Switched to model: ${model}`);
  }

  private getProviderForModel(model: string): Provider | undefined {
    for (const provider of this.providers.values()) {
      if (provider.models.includes(model)) {
        return provider;
      }
    }
    return undefined;
  }
}

CLI 使用：

bash 复制代码

# 启动时指定
blade --model=claude-opus-4.6 "优化代码"

# 运行时切换
> /model gpt-5.2
✅ Switched to gpt-5.2

> /model deepseek-chat
✅ Switched to deepseek-chat

自动降级

主模型失败时，自动尝试备用模型：

flowchart TD A[开始请求] --> B[尝试主模型] B --> C{成功?} C -->|是| D[返回结果] C -->|否| E{可重试错误?} E -->|否| F[抛出错误] E -->|是| G{还有备用模型?} G -->|否| F G -->|是| H[切换到下一个模型] H --> B style D fill:#90EE90 style F fill:#FFB6C6

typescript 复制代码

class ModelManager {
  async generateWithFallback(
    prompt: string,
    options: GenerateOptions
  ): Promise<GenerateResponse> {
    const models = [
      options.model || this.currentModel,
      ...(options.fallback || this.defaultFallbackChain),
    ];

    let lastError: Error | undefined;

    for (const model of models) {
      try {
        this.logger.info(`Trying model: ${model}`);
        const response = await this.generate(prompt, { ...options, model });
        return response;
      } catch (error) {
        this.logger.warn(`Model ${model} failed:`, error);
        lastError = error as Error;
        
        if (!this.shouldRetry(error)) {
          throw error;
        }
      }
    }

    throw new Error(`All models failed. Last error: ${lastError?.message}`);
  }

  private shouldRetry(error: any): boolean {
    // 速率限制、服务不可用：重试
    if (error.status === 429 || error.status === 503) return true;
    // 认证失败、无效请求：不重试
    if (error.status === 401 || error.status === 400) return false;
    return true;
  }
}

配置降级链：

typescript 复制代码

const config = {
  defaultModel: 'claude-opus-4.6',
  fallbackChain: [
    'gpt-5.2',
    'deepseek-chat',
    'gemini-2.5-flash',
  ],
};

智能路由

根据任务类型自动选择模型：

flowchart LR A[用户输入] --> B{检测任务类型} B -->|代码生成| C[claude-opus-4.6] B -->|翻译| D[gpt-5-mini] B -->|推理| E[gpt-5.2-pro] B -->|日常对话| F[deepseek-chat] style C fill:#E8D5B7 style D fill:#B7D5E8 style E fill:#D5B7E8 style F fill:#B7E8D5

typescript 复制代码

class ModelRouter {
  private taskModelMap: Record<string, string> = {
    'code-generation': 'claude-opus-4.6',
    'code-review': 'claude-opus-4.6',
    'translation': 'gpt-5-mini',
    'reasoning': 'gpt-5.2-pro',
    'chat': 'deepseek-chat',
    'summarization': 'gpt-5-mini',
  };

  selectModel(task: string, userPreference?: string): string {
    if (userPreference) return userPreference;
    return this.taskModelMap[task] || this.defaultModel;
  }

  async detectTaskType(prompt: string): Promise<string> {
    if (/写代码|生成代码|implement|create function/i.test(prompt)) {
      return 'code-generation';
    }
    if (/审查|review|check code/i.test(prompt)) {
      return 'code-review';
    }
    if (/翻译|translate/i.test(prompt)) {
      return 'translation';
    }
    if (/推理|分析|reasoning|analyze/i.test(prompt)) {
      return 'reasoning';
    }
    return 'chat';
  }
}

成本优化

成本追踪

实时追踪每次请求的成本：

typescript 复制代码

class CostTracker {
  private totalCost = 0;
  private costByModel: Map<string, number> = new Map();

  trackUsage(model: string, usage: Usage): void {
    const modelInfo = this.modelManager.getModelInfo(model);
    const cost = this.calculateCost(usage, modelInfo);

    this.totalCost += cost;
    this.costByModel.set(
      model,
      (this.costByModel.get(model) || 0) + cost
    );

    this.logger.info(`Cost: $${cost.toFixed(4)} (Total: $${this.totalCost.toFixed(4)})`);
  }

  private calculateCost(usage: Usage, modelInfo: ModelInfo): number {
    const inputCost = (usage.promptTokens / 1_000_000) * modelInfo.costPer1MTokens.input;
    const outputCost = (usage.completionTokens / 1_000_000) * modelInfo.costPer1MTokens.output;
    return inputCost + outputCost;
  }

  getReport(): CostReport {
    return {
      totalCost: this.totalCost,
      costByModel: Object.fromEntries(this.costByModel),
      averageCostPerRequest: this.totalCost / this.requestCount,
    };
  }
}

输出示例：

yaml 复制代码

💰 Cost Report:
  Total: $1.85
  By Model:
    - claude-opus-4.6: $1.20 (65%)
    - gpt-5-mini: $0.35 (19%)
    - deepseek-chat: $0.30 (16%)
  Average per request: $0.09

预算控制

设置每日/每月预算上限：

typescript 复制代码

class BudgetController {
  private dailyLimit: number;
  private monthlyLimit: number;
  private dailySpent = 0;
  private monthlySpent = 0;

  async checkBudget(estimatedCost: number): Promise<boolean> {
    if (this.dailySpent + estimatedCost > this.dailyLimit) {
      throw new Error(`Daily budget exceeded: $${this.dailyLimit}`);
    }
    if (this.monthlySpent + estimatedCost > this.monthlyLimit) {
      throw new Error(`Monthly budget exceeded: $${this.monthlyLimit}`);
    }
    return true;
  }

  recordSpending(cost: number): void {
    this.dailySpent += cost;
    this.monthlySpent += cost;
  }
}

省钱技巧

1. 简单任务用便宜模型

typescript 复制代码

// ❌ 浪费
const summary = await model.generate('总结这段文字', {
  model: 'gpt-5.2' // $1.75/M input
});

// ✅ 省钱
const summary = await model.generate('总结这段文字', {
  model: 'deepseek-chat' // $0.28/M input
});

2. 用缓存省钱

OpenAI 和 Anthropic 都支持 prompt caching，缓存命中时输入成本降低 90%：

typescript 复制代码

// OpenAI: cached input $0.175/M (vs $1.75/M)
// Anthropic: cached read $0.50/M (vs $5/M)
// DeepSeek: cache hit $0.028/M (vs $0.28/M)

3. 压缩上下文

typescript 复制代码

// ❌ 浪费：发送完整历史
const response = await model.generate(prompt, {
  messages: allMessages // 可能几千条
});

// ✅ 省钱：只保留最近的
const response = await model.generate(prompt, {
  messages: allMessages.slice(-10)
});

实战案例

案例 1：成本敏感的代码生成

需求：生成大量代码，但预算有限。

策略：用 deepseek-chat 生成初版（便宜），用 claude-opus-4.6 审查优化（准确）。

typescript 复制代码

async function generateCodeWithBudget(task: string): Promise<string> {
  // 第一步：便宜模型生成
  const draft = await model.generate(task, {
    model: 'deepseek-chat', // $0.28/M input
  });

  // 第二步：强大模型审查
  const review = await model.generate(
    `审查并优化这段代码：\n${draft.content}`,
    { model: 'claude-opus-4.6' } // $5/M input
  );

  return review.content;
}

成本对比：

全程用 Claude Opus 4.6：$5/M input
混合策略：约 $1/M input（省 80%）

案例 2：高可用生产环境

需求：服务不能中断，即使某个模型宕机。

策略：配置多层降级链。

typescript 复制代码

const config = {
  primaryModel: 'claude-opus-4.6',
  fallbackChain: [
    'gpt-5.2',           // 第一备选
    'deepseek-chat',     // 第二备选
    'gemini-2.5-flash',  // 第三备选（免费）
  ],
  retryConfig: {
    maxRetries: 3,
    backoffMs: 1000,
  },
};

async function generateWithHighAvailability(prompt: string): Promise<string> {
  return await model.generateWithFallback(prompt, config);
}

可用性：

单模型：99.9%
四模型降级：99.99%+

案例 3：智能任务路由

需求：根据任务类型自动选择最佳模型。

typescript 复制代码

async function smartGenerate(prompt: string): Promise<string> {
  const taskType = await router.detectTaskType(prompt);
  const model = router.selectModel(taskType);
  
  console.log(`Task: ${taskType}, Model: ${model}`);
  
  return await modelManager.generate(prompt, { model });
}

// 示例
await smartGenerate('写一个快速排序');
// → Task: code-generation, Model: claude-opus-4.6

await smartGenerate('翻译这段文字');
// → Task: translation, Model: gpt-5-mini

await smartGenerate('分析这个算法的时间复杂度');
// → Task: reasoning, Model: gpt-5.2-pro

未来计划

本地模型支持

计划支持本地运行的开源模型：

typescript 复制代码

// Ollama 集成
const localProvider = new OllamaProvider({
  baseUrl: 'http://localhost:11434',
  models: ['llama4', 'codellama', 'qwen2.5'],
});

// 混合使用：本地 + 云端
const response = await model.generate(prompt, {
  model: 'llama4',        // 本地（免费）
  fallback: ['gpt-5.2'],  // 云端备选（付费）
});

模型性能基准测试

自动测试不同模型在特定任务上的表现：

typescript 复制代码

class ModelBenchmark {
  async runBenchmark(task: string, models: string[]): Promise<BenchmarkResult> {
    const results = [];

    for (const model of models) {
      const start = Date.now();
      const response = await this.model.generate(task, { model });
      const duration = Date.now() - start;

      results.push({
        model,
        duration,
        cost: this.calculateCost(response.usage, model),
        quality: await this.evaluateQuality(response.content),
      });
    }

    return this.rankResults(results);
  }
}

动态定价优化

根据实时定价自动选择最便宜的模型：

typescript 复制代码

class DynamicPricingOptimizer {
  async selectCheapestModel(task: string): Promise<string> {
    const suitableModels = this.getSuitableModels(task);
    const prices = await this.fetchCurrentPrices(suitableModels);
    
    return prices.sort((a, b) => a.price - b.price)[0].model;
  }
}

总结

核心要点

统一接口 --- 所有模型通过 Provider 接口统一管理
灵活切换 --- 运行时动态切换，无需重启
自动降级 --- 主模型失败时自动尝试备用
成本优化 --- 实时追踪成本，智能选择模型
高可用 --- 多层降级保证服务不中断

设计原则

抽象优于具体 --- Provider 接口隔离具体实现
组合优于继承 --- 通过组合不同 Provider 实现多模型支持
配置优于硬编码 --- 模型选择、降级策略都可配置
监控优于盲目 --- 实时追踪成本和性能

最佳实践（2026 年 2 月）

日常开发 ：gpt-5-mini 或 deepseek-chat（便宜快速）
代码生成 ：claude-opus-4.6（代码能力最强）
深度推理 ：gpt-5.2-pro（推理能力最强）
生产环境：配置多层降级链（高可用）
成本敏感：混合使用便宜和昂贵模型

参考资源

本文由青雲 (echoVic) 撰写，基于 blade-code 的实践经验。 如有问题或建议，欢迎在 GitHub Issues 讨论。