blade-code 系列第 3 篇。从统一接口到模型切换,从成本优化到降级策略,聊聊怎么构建一个灵活的多模型 AI 系统。
为什么要支持多模型?
构建 AI 应用时,第一个问题往往是:"用哪个模型?"
- OpenAI GPT-5.2?强大但贵
- Claude Opus 4.6?代码能力强但有速率限制
- DeepSeek V3.2?便宜但稳定性存疑
- Gemini 2.5?免费额度大但有配额
我的答案是:全都要。
blade-code 从第一天就设计为多模型架构,支持 10+ 主流模型的无缝切换。用户可以根据任务类型、成本预算、速率限制灵活选择,甚至运行时动态切换。
单一模型的问题
依赖单一模型会遇到这些麻烦:
| 问题 | 影响 |
|---|---|
| 成本高 | GPT-5.2 pro 输出 $168/M tokens |
| 速率限制 | Claude 每分钟请求数有限 |
| 服务中断 | OpenAI 宕机时干瞪眼 |
| 能力差异 | 不同任务需要不同模型 |
| 地域限制 | 某些模型在特定地区不可用 |
多模型架构能解决这些问题:
成本优化 --- 简单任务用便宜模型,复杂任务用强大模型:
typescript
// 简单任务
const summary = await model.generate('总结这段文字', {
model: 'deepseek-chat' // $0.28/M input, $0.42/M output
});
// 复杂任务
const architecture = await model.generate('设计系统架构', {
model: 'claude-opus-4.6' // $5/M input, $25/M output
});
高可用 --- 主模型挂了自动切换:
typescript
const response = await model.generate(prompt, {
model: 'gpt-5.2',
fallback: ['claude-sonnet-4.5', 'deepseek-chat']
});
任务匹配 --- 不同任务用最合适的模型:
typescript
const taskModelMap = {
'code-generation': 'claude-opus-4.6',
'translation': 'gpt-5-mini',
'reasoning': 'gpt-5.2-pro',
'chat': 'deepseek-chat',
};
支持的模型(2026 年 2 月)
blade-code 目前支持这些模型:
| 提供商 | 模型 | 特点 | 成本 (输入/输出 per M) |
|---|---|---|---|
| OpenAI | gpt-5.2 | 旗舰,代码和 Agent 任务 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 1.75 / 1.75 / </math>1.75/14 |
| gpt-5.2-pro | 最强推理 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 21 / 21 / </math>21/168 | |
| gpt-5-mini | 轻量快速 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 0.25 / 0.25 / </math>0.25/2 | |
| gpt-4.1 | 可微调 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 3 / 3 / </math>3/12 | |
| gpt-4.1-mini | 微调轻量版 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 0.8 / 0.8 / </math>0.8/3.2 | |
| o4-mini | 推理模型 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 4 / 4 / </math>4/16 | |
| Anthropic | claude-opus-4.6 | 最强 Agent 和代码 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 5 / 5 / </math>5/25 |
| claude-sonnet-4.5 | 性价比最佳 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 3 / 3 / </math>3/15 | |
| claude-haiku-4.5 | 快速响应 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 1 / 1 / </math>1/5 | |
| gemini-2.5-pro | 长上下文 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 1.25 / 1.25 / </math>1.25/5 | |
| gemini-2.5-flash | 免费实验版 | 免费 | |
| DeepSeek | deepseek-chat (V3.2) | 性价比王 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 0.28 / 0.28 / </math>0.28/0.42 |
| deepseek-reasoner (V3.2) | 推理模式 | <math xmlns="http://www.w3.org/1998/Math/MathML"> 0.28 / 0.28 / </math>0.28/0.42 | |
| OpenRouter | 聚合多家 | 统一接口 | 按模型计费 |
怎么选?
代码生成: claude-opus-4.6 最强,deepseek-chat 最便宜
深度推理: gpt-5.2-pro 最强,deepseek-reasoner 性价比高
日常对话: gpt-5-mini 或 deepseek-chat,便宜又快
长文本: gemini-2.5-pro 支持超长上下文
统一接口设计
架构概览
所有模型提供商都实现统一的 Provider 接口:
typescript
interface Provider {
name: string;
models: string[];
generate(request: GenerateRequest): Promise<GenerateResponse>;
stream(request: GenerateRequest): AsyncIterable<StreamChunk>;
isAvailable(model: string): Promise<boolean>;
getModelInfo(model: string): ModelInfo;
}
interface GenerateRequest {
model: string;
messages: Message[];
temperature?: number;
maxTokens?: number;
tools?: Tool[];
}
interface GenerateResponse {
content: string;
usage: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
};
finishReason: 'stop' | 'length' | 'tool_calls';
toolCalls?: ToolCall[];
}
OpenAI Provider 实现
typescript
class OpenAIProvider implements Provider {
name = 'openai';
models = ['gpt-5.2', 'gpt-5.2-pro', 'gpt-5-mini', 'gpt-4.1', 'o4-mini'];
private client: OpenAI;
constructor(apiKey: string) {
this.client = new OpenAI({ apiKey });
}
async generate(request: GenerateRequest): Promise<GenerateResponse> {
const response = await this.client.chat.completions.create({
model: request.model,
messages: request.messages,
temperature: request.temperature,
max_tokens: request.maxTokens,
tools: request.tools,
});
return {
content: response.choices[0].message.content || '',
usage: {
promptTokens: response.usage?.prompt_tokens || 0,
completionTokens: response.usage?.completion_tokens || 0,
totalTokens: response.usage?.total_tokens || 0,
},
finishReason: response.choices[0].finish_reason as any,
toolCalls: response.choices[0].message.tool_calls,
};
}
async *stream(request: GenerateRequest): AsyncIterable<StreamChunk> {
const stream = await this.client.chat.completions.create({
model: request.model,
messages: request.messages,
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
if (delta?.content) {
yield { type: 'content', content: delta.content };
}
if (delta?.tool_calls) {
yield { type: 'tool_calls', toolCalls: delta.tool_calls };
}
}
}
async isAvailable(model: string): Promise<boolean> {
try {
await this.client.models.retrieve(model);
return true;
} catch {
return false;
}
}
getModelInfo(model: string): ModelInfo {
const infoMap: Record<string, ModelInfo> = {
'gpt-5.2': {
contextWindow: 256000,
maxOutputTokens: 32768,
costPer1MTokens: { input: 1.75, output: 14 },
capabilities: ['text', 'vision', 'tools', 'agents'],
},
'gpt-5-mini': {
contextWindow: 128000,
maxOutputTokens: 16384,
costPer1MTokens: { input: 0.25, output: 2 },
capabilities: ['text', 'vision', 'tools'],
},
};
return infoMap[model];
}
}
Anthropic Provider 实现
Anthropic 的消息格式和 OpenAI 不同,需要做转换:
typescript
class AnthropicProvider implements Provider {
name = 'anthropic';
models = [
'claude-opus-4.6',
'claude-sonnet-4.5',
'claude-haiku-4.5',
];
private client: Anthropic;
constructor(apiKey: string) {
this.client = new Anthropic({ apiKey });
}
async generate(request: GenerateRequest): Promise<GenerateResponse> {
// 转换消息格式:OpenAI -> Anthropic
const { system, messages } = this.convertMessages(request.messages);
const response = await this.client.messages.create({
model: request.model,
system,
messages,
max_tokens: request.maxTokens || 8192,
temperature: request.temperature,
tools: request.tools,
});
return {
content: this.extractContent(response.content),
usage: {
promptTokens: response.usage.input_tokens,
completionTokens: response.usage.output_tokens,
totalTokens: response.usage.input_tokens + response.usage.output_tokens,
},
finishReason: response.stop_reason as any,
toolCalls: this.extractToolCalls(response.content),
};
}
private convertMessages(messages: Message[]): {
system: string;
messages: Anthropic.MessageParam[];
} {
// 提取 system 消息
const systemMessages = messages.filter(m => m.role === 'system');
const system = systemMessages.map(m => m.content).join('\n\n');
// 转换其他消息
const anthropicMessages = messages
.filter(m => m.role !== 'system')
.map(m => ({
role: m.role as 'user' | 'assistant',
content: m.content,
}));
return { system, messages: anthropicMessages };
}
}
模型切换与降级
运行时切换
用户可以随时切换模型:
typescript
class ModelManager {
private currentModel: string;
private providers: Map<string, Provider>;
switchModel(model: string): void {
const provider = this.getProviderForModel(model);
if (!provider) {
throw new Error(`Model ${model} not supported`);
}
this.currentModel = model;
this.logger.info(`Switched to model: ${model}`);
}
private getProviderForModel(model: string): Provider | undefined {
for (const provider of this.providers.values()) {
if (provider.models.includes(model)) {
return provider;
}
}
return undefined;
}
}
CLI 使用:
bash
# 启动时指定
blade --model=claude-opus-4.6 "优化代码"
# 运行时切换
> /model gpt-5.2
✅ Switched to gpt-5.2
> /model deepseek-chat
✅ Switched to deepseek-chat
自动降级
主模型失败时,自动尝试备用模型:
typescript
class ModelManager {
async generateWithFallback(
prompt: string,
options: GenerateOptions
): Promise<GenerateResponse> {
const models = [
options.model || this.currentModel,
...(options.fallback || this.defaultFallbackChain),
];
let lastError: Error | undefined;
for (const model of models) {
try {
this.logger.info(`Trying model: ${model}`);
const response = await this.generate(prompt, { ...options, model });
return response;
} catch (error) {
this.logger.warn(`Model ${model} failed:`, error);
lastError = error as Error;
if (!this.shouldRetry(error)) {
throw error;
}
}
}
throw new Error(`All models failed. Last error: ${lastError?.message}`);
}
private shouldRetry(error: any): boolean {
// 速率限制、服务不可用:重试
if (error.status === 429 || error.status === 503) return true;
// 认证失败、无效请求:不重试
if (error.status === 401 || error.status === 400) return false;
return true;
}
}
配置降级链:
typescript
const config = {
defaultModel: 'claude-opus-4.6',
fallbackChain: [
'gpt-5.2',
'deepseek-chat',
'gemini-2.5-flash',
],
};
智能路由
根据任务类型自动选择模型:
typescript
class ModelRouter {
private taskModelMap: Record<string, string> = {
'code-generation': 'claude-opus-4.6',
'code-review': 'claude-opus-4.6',
'translation': 'gpt-5-mini',
'reasoning': 'gpt-5.2-pro',
'chat': 'deepseek-chat',
'summarization': 'gpt-5-mini',
};
selectModel(task: string, userPreference?: string): string {
if (userPreference) return userPreference;
return this.taskModelMap[task] || this.defaultModel;
}
async detectTaskType(prompt: string): Promise<string> {
if (/写代码|生成代码|implement|create function/i.test(prompt)) {
return 'code-generation';
}
if (/审查|review|check code/i.test(prompt)) {
return 'code-review';
}
if (/翻译|translate/i.test(prompt)) {
return 'translation';
}
if (/推理|分析|reasoning|analyze/i.test(prompt)) {
return 'reasoning';
}
return 'chat';
}
}
成本优化
成本追踪
实时追踪每次请求的成本:
typescript
class CostTracker {
private totalCost = 0;
private costByModel: Map<string, number> = new Map();
trackUsage(model: string, usage: Usage): void {
const modelInfo = this.modelManager.getModelInfo(model);
const cost = this.calculateCost(usage, modelInfo);
this.totalCost += cost;
this.costByModel.set(
model,
(this.costByModel.get(model) || 0) + cost
);
this.logger.info(`Cost: $${cost.toFixed(4)} (Total: $${this.totalCost.toFixed(4)})`);
}
private calculateCost(usage: Usage, modelInfo: ModelInfo): number {
const inputCost = (usage.promptTokens / 1_000_000) * modelInfo.costPer1MTokens.input;
const outputCost = (usage.completionTokens / 1_000_000) * modelInfo.costPer1MTokens.output;
return inputCost + outputCost;
}
getReport(): CostReport {
return {
totalCost: this.totalCost,
costByModel: Object.fromEntries(this.costByModel),
averageCostPerRequest: this.totalCost / this.requestCount,
};
}
}
输出示例:
yaml
💰 Cost Report:
Total: $1.85
By Model:
- claude-opus-4.6: $1.20 (65%)
- gpt-5-mini: $0.35 (19%)
- deepseek-chat: $0.30 (16%)
Average per request: $0.09
预算控制
设置每日/每月预算上限:
typescript
class BudgetController {
private dailyLimit: number;
private monthlyLimit: number;
private dailySpent = 0;
private monthlySpent = 0;
async checkBudget(estimatedCost: number): Promise<boolean> {
if (this.dailySpent + estimatedCost > this.dailyLimit) {
throw new Error(`Daily budget exceeded: $${this.dailyLimit}`);
}
if (this.monthlySpent + estimatedCost > this.monthlyLimit) {
throw new Error(`Monthly budget exceeded: $${this.monthlyLimit}`);
}
return true;
}
recordSpending(cost: number): void {
this.dailySpent += cost;
this.monthlySpent += cost;
}
}
省钱技巧
1. 简单任务用便宜模型
typescript
// ❌ 浪费
const summary = await model.generate('总结这段文字', {
model: 'gpt-5.2' // $1.75/M input
});
// ✅ 省钱
const summary = await model.generate('总结这段文字', {
model: 'deepseek-chat' // $0.28/M input
});
2. 用缓存省钱
OpenAI 和 Anthropic 都支持 prompt caching,缓存命中时输入成本降低 90%:
typescript
// OpenAI: cached input $0.175/M (vs $1.75/M)
// Anthropic: cached read $0.50/M (vs $5/M)
// DeepSeek: cache hit $0.028/M (vs $0.28/M)
3. 压缩上下文
typescript
// ❌ 浪费:发送完整历史
const response = await model.generate(prompt, {
messages: allMessages // 可能几千条
});
// ✅ 省钱:只保留最近的
const response = await model.generate(prompt, {
messages: allMessages.slice(-10)
});
实战案例
案例 1:成本敏感的代码生成
需求:生成大量代码,但预算有限。
策略:用 deepseek-chat 生成初版(便宜),用 claude-opus-4.6 审查优化(准确)。
typescript
async function generateCodeWithBudget(task: string): Promise<string> {
// 第一步:便宜模型生成
const draft = await model.generate(task, {
model: 'deepseek-chat', // $0.28/M input
});
// 第二步:强大模型审查
const review = await model.generate(
`审查并优化这段代码:\n${draft.content}`,
{ model: 'claude-opus-4.6' } // $5/M input
);
return review.content;
}
成本对比:
- 全程用 Claude Opus 4.6:$5/M input
- 混合策略:约 $1/M input(省 80%)
案例 2:高可用生产环境
需求:服务不能中断,即使某个模型宕机。
策略:配置多层降级链。
typescript
const config = {
primaryModel: 'claude-opus-4.6',
fallbackChain: [
'gpt-5.2', // 第一备选
'deepseek-chat', // 第二备选
'gemini-2.5-flash', // 第三备选(免费)
],
retryConfig: {
maxRetries: 3,
backoffMs: 1000,
},
};
async function generateWithHighAvailability(prompt: string): Promise<string> {
return await model.generateWithFallback(prompt, config);
}
可用性:
- 单模型:99.9%
- 四模型降级:99.99%+
案例 3:智能任务路由
需求:根据任务类型自动选择最佳模型。
typescript
async function smartGenerate(prompt: string): Promise<string> {
const taskType = await router.detectTaskType(prompt);
const model = router.selectModel(taskType);
console.log(`Task: ${taskType}, Model: ${model}`);
return await modelManager.generate(prompt, { model });
}
// 示例
await smartGenerate('写一个快速排序');
// → Task: code-generation, Model: claude-opus-4.6
await smartGenerate('翻译这段文字');
// → Task: translation, Model: gpt-5-mini
await smartGenerate('分析这个算法的时间复杂度');
// → Task: reasoning, Model: gpt-5.2-pro
未来计划
本地模型支持
计划支持本地运行的开源模型:
typescript
// Ollama 集成
const localProvider = new OllamaProvider({
baseUrl: 'http://localhost:11434',
models: ['llama4', 'codellama', 'qwen2.5'],
});
// 混合使用:本地 + 云端
const response = await model.generate(prompt, {
model: 'llama4', // 本地(免费)
fallback: ['gpt-5.2'], // 云端备选(付费)
});
模型性能基准测试
自动测试不同模型在特定任务上的表现:
typescript
class ModelBenchmark {
async runBenchmark(task: string, models: string[]): Promise<BenchmarkResult> {
const results = [];
for (const model of models) {
const start = Date.now();
const response = await this.model.generate(task, { model });
const duration = Date.now() - start;
results.push({
model,
duration,
cost: this.calculateCost(response.usage, model),
quality: await this.evaluateQuality(response.content),
});
}
return this.rankResults(results);
}
}
动态定价优化
根据实时定价自动选择最便宜的模型:
typescript
class DynamicPricingOptimizer {
async selectCheapestModel(task: string): Promise<string> {
const suitableModels = this.getSuitableModels(task);
const prices = await this.fetchCurrentPrices(suitableModels);
return prices.sort((a, b) => a.price - b.price)[0].model;
}
}
总结
核心要点
- 统一接口 --- 所有模型通过
Provider接口统一管理 - 灵活切换 --- 运行时动态切换,无需重启
- 自动降级 --- 主模型失败时自动尝试备用
- 成本优化 --- 实时追踪成本,智能选择模型
- 高可用 --- 多层降级保证服务不中断
设计原则
- 抽象优于具体 --- Provider 接口隔离具体实现
- 组合优于继承 --- 通过组合不同 Provider 实现多模型支持
- 配置优于硬编码 --- 模型选择、降级策略都可配置
- 监控优于盲目 --- 实时追踪成本和性能
最佳实践(2026 年 2 月)
- 日常开发 :
gpt-5-mini或deepseek-chat(便宜快速) - 代码生成 :
claude-opus-4.6(代码能力最强) - 深度推理 :
gpt-5.2-pro(推理能力最强) - 生产环境:配置多层降级链(高可用)
- 成本敏感:混合使用便宜和昂贵模型
参考资源
本文由 青雲 (echoVic) 撰写,基于 blade-code 的实践经验。 如有问题或建议,欢迎在 GitHub Issues 讨论。