错误处理与容错机制：让AI学会“从失败中学习”

为什么错误处理决定Agent的可用性？

一个让人崩溃的场景

想象一下，我们的 AI 助手正在运行，用户问：

text 复制代码

用户："帮我查一下北京的天气"
AI：调用 get_weather("北京")

此时，网络发生了波动，API 超时了，我们的 Agent 会怎么样？

不好的实现：

text 复制代码

程序崩溃，用户看到一堆红色错误堆栈：
TypeError: Cannot read property 'temperature' of undefined
    at processWeatherResult (agent.js:156:23)

好的实现：

text 复制代码

AI："网络似乎有点问题，正在重试..."
（等待2秒后）
AI："好了！北京今天晴，22℃"

用户体验差距：一个是"这程序真垃圾"，另一个是"这 AI 真聪明"。

为什么错误处理至关重要？

维度	没有错误处理	有错误处理
用户体验	崩溃、迷茫	流畅、专业
可用性	首次失败就放弃	自动恢复，成功率提升30%+
调试难度	无法定位问题	标准化错误信息，快速定位
AI智能感	像个bug	像个聪明的助手

错误的分类与识别

工具错误的五大家族

错误类型	示例	特征	AI 能做什么
参数错误	城市名拼写错误、缺少必填参数	参数不符合要求	修正参数重试
业务错误	用户ID不存在、文件已删除	数据未找到	建议替代方案
网络错误	超时、连接失败、DNS解析失败	临时性问题	自动重试
权限错误	无权限访问文件、API Key失效	永久性问题	告知用户
限流错误	请求过频、配额耗尽	临时限制	等待后重试

错误的可恢复性判断

typescript 复制代码

// 错误可恢复性分类
interface ErrorClassification {
  recoverable: boolean
  action: 'retry' | 'fix-params' | 'ask-user' | 'abort'
  maxRetries?: number
  retryDelay?: number
}

const errorStrategies: Record<string, ErrorClassification> = {
  // 可恢复的错误
  'NETWORK_TIMEOUT': { recoverable: true, action: 'retry', maxRetries: 3, retryDelay: 1000 },
  'RATE_LIMIT': { recoverable: true, action: 'retry', maxRetries: 5, retryDelay: 2000 },
  'INVALID_PARAM': { recoverable: true, action: 'fix-params', maxRetries: 2 },
  
  // 不可恢复的错误
  'PERMISSION_DENIED': { recoverable: false, action: 'abort' },
  'RESOURCE_NOT_FOUND': { recoverable: false, action: 'ask-user' },
  'AUTH_FAILED': { recoverable: false, action: 'abort' }
}

标准化错误信息设计

错误信息的结构化模板

typescript 复制代码

/**
 * 标准化错误信息接口
 * 让AI能够理解错误并采取行动
 */
interface StandardizedError {
  error: {
    // 机器可读的错误码，用于程序判断
    code: string
    
    // 人类可读的错误描述，AI会读给用户
    message: string
    
    // 错误类型分类：param | business | network | permission | rate_limit
    type: ErrorType
    
    // 是否可恢复（AI可以重试或修正）
    recoverable: boolean
    
    // 补救建议（给AI看的）
    suggestion?: string
    
    // 详细信息（调试用）
    details?: any
    
    // 限流时建议等待秒数
    retryAfter?: number
    
    // 可选参数（如果有修正建议）
    correctedParams?: Record<string, any>
  }
}

好错误 vs 坏错误

坏错误（AI 看不懂）

text 复制代码

Error: Cannot read property 'name' of undefined
    at getWeather (weather.js:42:23)
    at processRequest (agent.js:156:15)

好错误（AI 能理解并采取行动）

json 复制代码

{
  "error": {
    "code": "USER_NOT_FOUND",
    "message": "用户ID 'abc123' 不存在于系统中",
    "type": "business",
    "recoverable": true,
    "suggestion": "请检查用户ID是否正确。如需帮助，可以询问用户提供正确的ID或使用邮箱搜索",
    "details": {
      "invalidId": "abc123",
      "validFormat": "user_xxxxxxxx"
    }
  }
}

当 AI 看到这个错误后，它就会理解：

这是一个业务错误（用户不存在）
是可恢复的（可以修正参数重试）
知道如何建议用户（检查ID格式）

不同错误类型的处理实战

参数错误：让 AI 修正后重试

AI 传入错误格式的城市名，如：AI 需要城市名为"北京"，但用户输入了"beijing"、"帝都"等参数，AI 无法识别。此时我们可以添加城市名映射（同义词处理），让 AI 修正后重试：

typescript 复制代码

async function getWeather(city: string): Promise<any> {
  // 城市白名单
  const validCities = ['北京', '上海', '深圳', '广州', '杭州']
  
  // 城市名映射（同义词处理）
  const cityAliases: Record<string, string> = {
    'beijing': '北京',
    'bj': '北京',
    '帝都': '北京',
    'shanghai': '上海',
    'sh': '上海',
    '魔都': '上海',
    'sz': '深圳',
    '鹏城': '深圳'
  }
  
  // 尝试映射别名
  const normalizedCity = cityAliases[city.toLowerCase()] || city
  
  if (!validCities.includes(normalizedCity)) {
    // 返回结构化错误，告诉AI如何修正
    return {
      error: {
        code: 'INVALID_CITY',
        message: `"${city}" 不在支持的天气查询列表中`,
        type: 'param',
        recoverable: true,
        suggestion: `请使用以下城市名之一：${validCities.join('、')}。如果用户说的是别名，请使用正式名称。`,
        details: {
          invalidCity: city,
          validCities,
          suggestedCorrection: this.findClosestMatch(city, validCities)
        }
      }
    }
  }
  
  // 正常返回天气数据
  return {
    city: normalizedCity,
    temperature: 22,
    condition: '晴',
    humidity: 45
  }
}

// 辅助函数：找最接近的匹配
function findClosestMatch(input: string, candidates: string[]): string | null {
  // 简单实现：精确匹配或拼音匹配
  // 实际项目中可使用拼音库或模糊匹配算法
  if (candidates.includes(input)) return input
  return null
}

此时，当 AI 看到错误的行为后，会：

理解错误类型是参数错误
阅读 suggestion，知道应该用正式城市名
修正参数，重新调用
第二次调用成功

网络错误：自动重试机制

当出现网络错误时，带自动重试的工具调用，使用指数退避策略，避免加重服务器负担：

typescript 复制代码

async function callWithRetry<T>(
  fn: () => Promise<T>,
  options: {
    maxRetries?: number
    baseDelay?: number
    onRetry?: (attempt: number, error: Error) => void
  } = {}
): Promise<T> {
  const { maxRetries = 3, baseDelay = 1000, onRetry } = options
  
  let lastError: Error
  
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn()
    } catch (error: any) {
      lastError = error
      
      // 最后一次尝试失败
      if (attempt === maxRetries) {
        break
      }
      
      // 判断是否值得重试（只有网络错误才重试）
      const shouldRetry = error.type === 'network' || 
                          error.type === 'timeout' ||
                          error.code === 'ECONNRESET'
      
      if (!shouldRetry) {
        throw error
      }
      
      // 指数退避延迟
      const delay = baseDelay * Math.pow(2, attempt)
      if (onRetry) onRetry(attempt + 1, error)
      
      await new Promise(resolve => setTimeout(resolve, delay))
    }
  }
  
  throw lastError!
}

// 使用示例
async function getWeatherWithRetry(city: string): Promise<any> {
  return callWithRetry(
    () => getWeather(city),
    {
      maxRetries: 3,
      baseDelay: 1000,
      onRetry: (attempt, error) => {
        console.log(`⚠️ 第${attempt}次重试，错误: ${error.message}`)
      }
    }
  )
}

限流错误：智能等待

限流错误处理，可以返回建议等待时间，让 AI 决定何时重试：

typescript 复制代码

async function callRateLimitedAPI(endpoint: string): Promise<any> {
  // 模拟限流检查
  const rateLimitInfo = checkRateLimit(endpoint)
  
  if (rateLimitInfo.isLimited) {
    return {
      error: {
        code: 'RATE_LIMIT_EXCEEDED',
        message: `API调用频率超限，当前限制：${rateLimitInfo.limit}/分钟`,
        type: 'rate_limit',
        recoverable: true,
        retryAfter: rateLimitInfo.resetIn,  // 秒
        suggestion: `请等待 ${rateLimitInfo.resetIn} 秒后重试。如需更频繁调用，请升级套餐。`,
        details: {
          current: rateLimitInfo.current,
          limit: rateLimitInfo.limit,
          resetIn: rateLimitInfo.resetIn
        }
      }
    }
  }
  
  // 正常调用
  return await callAPI(endpoint)
}

// 限流检查器
class RateLimiter {
  private requests: number[] = []
  private limit: number = 60  // 每分钟60次
  private window: number = 60000  // 1分钟窗口
  
  check(): { isLimited: boolean; current: number; limit: number; resetIn: number } {
    const now = Date.now()
    
    // 清理超出窗口的请求
    this.requests = this.requests.filter(timestamp => now - timestamp < this.window)
    
    const current = this.requests.length
    const isLimited = current >= this.limit
    
    // 计算重置时间（最早请求过期时间）
    const oldestRequest = this.requests[0]
    const resetIn = oldestRequest ? Math.max(0, (oldestRequest + this.window - now) / 1000) : 0
    
    return { isLimited, current, limit: this.limit, resetIn }
  }
  
  record(): void {
    this.requests.push(Date.now())
  }
}

权限错误：优雅降级

权限错误，属于不可恢复的错误，但我们可以提供替代方案：

typescript 复制代码

async function readFileSafely(filePath: string): Promise<any> {
  try {
    // 尝试读取文件
    const content = await fs.readFile(filePath, 'utf-8')
    return { success: true, content }
  } catch (error: any) {
    // 权限错误
    if (error.code === 'EACCES') {
      return {
        error: {
          code: 'PERMISSION_DENIED',
          message: `无法读取文件 "${filePath}"，权限不足`,
          type: 'permission',
          recoverable: false,  // 不可恢复，AI不应重试
          suggestion: `请尝试以下替代方案：
1. 检查文件权限
2. 使用管理员权限运行程序
3. 选择其他可读文件
4. 如果是敏感文件，请先在资源管理器中打开`,
          details: {
            path: filePath,
            requiredPermission: 'read',
            currentUser: process.env.USER
          }
        }
      }
    }
    
    // 文件不存在
    if (error.code === 'ENOENT') {
      return {
        error: {
          code: 'FILE_NOT_FOUND',
          message: `文件 "${filePath}" 不存在`,
          type: 'business',
          recoverable: true,  // 可以尝试其他路径
          suggestion: `请检查文件路径是否正确。如需帮助，可以：
1. 列出当前目录下的文件
2. 询问用户正确的文件路径`,
          details: { path: filePath }
        }
      }
    }
    
    throw error
  }
}

AI 重试策略设计

系统 Prompt 中的重试指引

我们可以在系统 Prompt 中，预先加入错误处理规则，让 AI 知道如何应对错误：

text 复制代码

## 错误处理规则

当工具返回错误时，请按以下规则处理：

### 1. 参数错误（INVALID_PARAM）
- 阅读 `suggestion` 字段了解如何修正
- 修正参数后立即重试
- 如果2次修正后仍然失败，告知用户

### 2. 网络错误（NETWORK_TIMEOUT）
- 自动重试，最多3次
- 每次重试间隔递增（1秒、2秒、4秒）
- 3次失败后告知用户"网络不稳定，请稍后重试"

### 3. 限流错误（RATE_LIMIT）
- 按照 `retryAfter` 字段等待指定秒数
- 等待后自动重试
- 如果连续限流3次，告知用户"请求过于频繁"

### 4. 权限错误（PERMISSION_DENIED）
- **不重试**，直接告知用户
- 提供 `suggestion` 中的替代方案

### 5. 业务错误（NOT_FOUND）
- 告知用户找不到资源
- 询问是否尝试其他标识符
- 不要重复相同的查询

重试的边界条件

条件	策略	代码实现
重试次数限制	最多3次	if (attempt >= 3) break
重试间隔	指数退避：1s, 2s, 4s	delay = 1000 * Math.pow(2, attempt)
超时设置	单次调用不超过10秒	Promise.race([call, timeout(10000)])
用户中断	允许用户取消	AbortController
错误类型判断	只重试可恢复错误	if (!error.recoverable) throw error

指数退避实现

typescript 复制代码

/**
 * 完整的指数退避重试机制
 * - 可配置的最大重试次数
 * - 可配置的延迟基数
 * - 随机抖动（避免雷群效应）
 * - 可恢复错误判断
 */
interface RetryOptions {
  maxRetries?: number
  baseDelay?: number
  maxDelay?: number
  useJitter?: boolean
  shouldRetry?: (error: any) => boolean
  onRetry?: (attempt: number, error: any, delay: number) => void
}

async function exponentialBackoff<T>(
  fn: () => Promise<T>,
  options: RetryOptions = {}
): Promise<T> {
  const {
    maxRetries = 3,
    baseDelay = 1000,
    maxDelay = 30000,
    useJitter = true,
    shouldRetry = (error) => error?.recoverable === true || error?.type === 'network',
    onRetry
  } = options
  
  let lastError: any
  
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn()
    } catch (error) {
      lastError = error
      
      // 最后一次尝试失败
      if (attempt === maxRetries) {
        break
      }
      
      // 检查是否值得重试
      if (!shouldRetry(error)) {
        throw error
      }
      
      // 计算延迟
      let delay = baseDelay * Math.pow(2, attempt)
      delay = Math.min(delay, maxDelay)
      
      // 添加随机抖动（避免多个客户端同时重试）
      if (useJitter) {
        delay = delay * (0.8 + Math.random() * 0.4)
      }
      
      if (onRetry) {
        onRetry(attempt + 1, error, delay)
      }
      
      await new Promise(resolve => setTimeout(resolve, delay))
    }
  }
  
  throw lastError
}

// 使用示例
const result = await exponentialBackoff(
  () => getWeather('北京'),
  {
    maxRetries: 5,
    baseDelay: 500,
    onRetry: (attempt, error, delay) => {
      console.log(`🔄 第${attempt}次重试，等待${Math.round(delay)}ms，错误: ${error.message}`)
    }
  }
)

让AI学会"放弃"

什么情况下应该放弃

情况	放弃策略	AI 行为
重试次数用尽	告知失败，建议其他方案	"多次尝试后仍然失败，建议稍后再试或使用其他方式"
不可恢复错误	直接告知，不重试	"没有权限访问该文件，请检查权限设置"
用户取消	停止执行	"好的，已取消操作"
超时过长	告知超时，建议稍后	"操作超时，可能是网络问题，请稍后重试"
业务逻辑不允许	终止执行	"该操作已被禁止，请联系管理员"

优雅放弃的实现

typescript 复制代码

/**
 * 带优雅放弃的执行器
 * - 可配置的重试策略
 * - 放弃时返回友好信息
 * - 保留部分成功的结果
 */
interface ExecutionResult {
  success: boolean
  tool_call_id: string
  result?: any
  error?: {
    code: string
    message: string
    final: boolean  // 是否是最终放弃
    suggestion?: string
  }
}

class GracefulExecutor {
  private maxRetries: number
  private abortController: AbortController | null = null
  
  constructor(options: { maxRetries?: number } = {}) {
    this.maxRetries = options.maxRetries || 3
  }
  
  async execute(toolCall: ToolCall): Promise<ExecutionResult> {
    let lastError: any
    let attempt = 0
    
    while (attempt <= this.maxRetries) {
      // 检查是否被取消
      if (this.abortController?.signal.aborted) {
        return {
          success: false,
          tool_call_id: toolCall.id,
          error: {
            code: 'USER_CANCELLED',
            message: '用户取消了操作',
            final: true,
            suggestion: '您可以稍后再次尝试'
          }
        }
      }
      
      try {
        const result = await this.executeTool(toolCall)
        return {
          success: true,
          tool_call_id: toolCall.id,
          result
        }
      } catch (error: any) {
        lastError = error
        
        // 检查是否应该放弃
        const shouldAbort = this.shouldAbort(error)
        
        if (shouldAbort || attempt === this.maxRetries) {
          return this.formatFailure(toolCall.id, error, attempt === this.maxRetries)
        }
        
        // 等待后重试
        const delay = this.calculateDelay(attempt, error)
        await this.sleep(delay)
        attempt++
      }
    }
    
    return this.formatFailure(toolCall.id, lastError, true)
  }
  
  private shouldAbort(error: any): boolean {
    // 不可恢复的错误立即放弃
    if (error.recoverable === false) return true
    
    // 权限错误立即放弃
    if (error.type === 'permission') return true
    
    // 认证失败立即放弃
    if (error.code === 'AUTH_FAILED') return true
    
    return false
  }
  
  private calculateDelay(attempt: number, error: any): number {
    // 如果错误指定了重试时间，优先使用
    if (error.retryAfter) {
      return error.retryAfter * 1000
    }
    
    // 指数退避
    return Math.min(1000 * Math.pow(2, attempt), 30000)
  }
  
  private formatFailure(
    toolCallId: string,
    error: any,
    isFinal: boolean
  ): ExecutionResult {
    return {
      success: false,
      tool_call_id: toolCallId,
      error: {
        code: error.code || 'UNKNOWN_ERROR',
        message: error.message || '执行失败',
        final: isFinal,
        suggestion: error.suggestion || this.getDefaultSuggestion(error.type)
      }
    }
  }
  
  private getDefaultSuggestion(errorType: string): string {
    const suggestions: Record<string, string> = {
      network: '请检查网络连接后重试',
      rate_limit: '请求频率过高，请稍后重试',
      permission: '权限不足，请联系管理员',
      param: '请检查输入参数是否正确',
      business: '操作失败，请确认条件是否满足'
    }
    return suggestions[errorType] || '请稍后重试或联系技术支持'
  }
  
  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms))
  }
  
  private async executeTool(toolCall: ToolCall): Promise<any> {
    // 实际工具执行逻辑
    // ...
  }
  
  cancel() {
    if (this.abortController) {
      this.abortController.abort()
    }
  }
}

实战：完整的错误处理Agent

错误感知的Agent核心

typescript 复制代码

import { exponentialBackoff } from './retry'
import { ErrorClassifier } from './error-classifier'

class ResilientAgent {
  private errorTracker = new ErrorTracker()
  private maxConsecutiveErrors = 3
  private consecutiveErrors = 0
  
  async executeWithErrorHandling(toolCalls: ToolCall[]): Promise<ToolResult[]> {
    const results: ToolResult[] = []
    
    for (const toolCall of toolCalls) {
      const result = await this.executeWithContext(toolCall)
      results.push(result)
      
      // 记录错误用于调试和监控
      if (!result.success) {
        this.errorTracker.record(result.error, toolCall.function.name)
        this.consecutiveErrors++
        
        // 连续错误太多，可能系统有问题
        if (this.consecutiveErrors >= this.maxConsecutiveErrors) {
          console.warn('⚠️ 连续错误过多，可能系统异常')
        }
      } else {
        this.consecutiveErrors = 0
      }
    }
    
    return results
  }
  
  private async executeWithContext(toolCall: ToolCall): Promise<ToolResult> {
    const startTime = Date.now()
    
    try {
      const result = await exponentialBackoff(
        () => this.executeTool(toolCall),
        {
          maxRetries: 3,
          baseDelay: 500,
          shouldRetry: (error) => error.recoverable === true,
          onRetry: (attempt, error, delay) => {
            console.log(`🔄 重试 ${toolCall.function.name} (${attempt}/3)，等待${Math.round(delay)}ms`)
          }
        }
      )
      
      return {
        tool_call_id: toolCall.id,
        success: true,
        result,
        duration: Date.now() - startTime
      }
    } catch (error: any) {
      return {
        tool_call_id: toolCall.id,
        success: false,
        error: this.normalizeError(error),
        duration: Date.now() - startTime
      }
    }
  }
  
  private normalizeError(error: any): ToolError {
    // 如果是已经标准化的错误，直接返回
    if (error.error && error.error.code) {
      return error.error
    }
    
    // 标准化原始错误
    return {
      code: this.guessErrorCode(error),
      message: error.message || '未知错误',
      type: this.guessErrorType(error),
      recoverable: this.isRecoverable(error),
      suggestion: this.getSuggestion(error)
    }
  }
  
  private guessErrorCode(error: any): string {
    if (error.code === 'ECONNREFUSED') return 'CONNECTION_REFUSED'
    if (error.code === 'ETIMEDOUT') return 'NETWORK_TIMEOUT'
    if (error.message?.includes('rate limit')) return 'RATE_LIMIT'
    if (error.message?.includes('permission')) return 'PERMISSION_DENIED'
    return 'UNKNOWN_ERROR'
  }
  
  private guessErrorType(error: any): ErrorType {
    if (error.code?.startsWith('ECONN')) return 'network'
    if (error.message?.includes('timeout')) return 'network'
    if (error.message?.includes('rate limit')) return 'rate_limit'
    if (error.message?.includes('permission')) return 'permission'
    if (error.message?.includes('not found')) return 'business'
    return 'unknown'
  }
  
  private isRecoverable(error: any): boolean {
    // 网络错误可恢复
    if (error.code?.startsWith('ECONN')) return true
    if (error.message?.includes('timeout')) return true
    
    // 限流可恢复
    if (error.message?.includes('rate limit')) return true
    
    // 权限错误不可恢复
    if (error.message?.includes('permission')) return false
    
    // 默认认为不可恢复
    return false
  }
  
  private getSuggestion(error: any): string {
    if (error.message?.includes('timeout')) {
      return '网络延迟较高，请检查网络连接后重试'
    }
    if (error.message?.includes('rate limit')) {
      return '请求频率过高，请稍后重试'
    }
    if (error.message?.includes('permission')) {
      return '权限不足，请确认账号权限'
    }
    return '请稍后重试或联系技术支持'
  }
  
  private async executeTool(toolCall: ToolCall): Promise<any> {
    // 实际工具执行
    // ...
  }
}

错误信息传递给 AI

typescript 复制代码

function formatErrorForAI(error: ToolError): string {
  const lines: string[] = []
  
  lines.push(`❌ 执行失败: ${error.message}`)
  
  if (error.suggestion) {
    lines.push(`💡 ${error.suggestion}`)
  }
  
  if (error.retryAfter) {
    lines.push(`⏱️ 建议等待 ${error.retryAfter} 秒后重试`)
  }
  
  if (error.details) {
    lines.push(`📋 详情: ${JSON.stringify(error.details)}`)
  }
  
  // 添加恢复指引
  if (error.recoverable) {
    lines.push(`🔄 这是一个可恢复的错误，你可以修正参数后重试`)
  } else {
    lines.push(`❌ 这是一个不可恢复的错误，请不要重复尝试相同的操作`)
  }
  
  return lines.join('\n')
}

// 使用示例
// 在消息历史中，错误会以这种格式呈现给AI：
{
  role: 'tool',
  tool_call_id: 'call_123',
  content: formatErrorForAI(error)
}

当 AI 看到后，它就会理解：

发生了什么错误
是否可以重试
应该怎么修正
如何告知用户

错误处理的最佳实践清单

设计阶段

每个工具都返回标准化错误格式。
明确标记错误是否可恢复（recoverable 字段）。
提供具体的补救建议（suggestion 字段）。
限流错误包含 retryAfter 字段。
定义清晰的错误码体系。

实现阶段

实现带指数退避的重试机制。
设置合理的超时时间（建议5-10秒）。
添加错误追踪和日志记录。
区分用户错误和系统错误。
支持用户取消正在进行的操作。

测试阶段

测试参数错误场景。
测试网络超时场景。
测试限流场景。
测试权限错误场景。
验证AI是否正确响应错误信息。

调试技巧

错误追踪器

typescript 复制代码

/**
 * 错误追踪器
 * 用于记录和分析错误模式
 */
class ErrorTracker {
  private errors: Map<string, {
    count: number
    lastSeen: Date
    examples: any[]
  }> = new Map()
  
  record(error: ToolError, toolName: string) {
    const key = `${toolName}:${error.code}`
    const existing = this.errors.get(key)
    
    if (existing) {
      existing.count++
      existing.lastSeen = new Date()
      if (existing.examples.length < 5) {
        existing.examples.push(error)
      }
    } else {
      this.errors.set(key, {
        count: 1,
        lastSeen: new Date(),
        examples: [error]
      })
    }
  }
  
  report(): string {
    const report: string[] = ['📊 错误统计报告', '='.repeat(40)]
    
    for (const [key, data] of this.errors) {
      report.push(`${key}: 发生${data.count}次，最后: ${data.lastSeen.toLocaleString()}`)
      if (data.examples[0]?.suggestion) {
        report.push(`  建议: ${data.examples[0].suggestion}`)
      }
    }
    
    return report.join('\n')
  }
  
  getTopErrors(limit: number = 5): Array<{ key: string; count: number }> {
    return Array.from(this.errors.entries())
      .sort((a, b) => b[1].count - a[1].count)
      .slice(0, limit)
      .map(([key, data]) => ({ key, count: data.count }))
  }
}

启用详细日志

typescript 复制代码

const DEBUG = process.env.DEBUG === 'true'

function logError(toolName: string, error: ToolError, attempt?: number) {
  if (!DEBUG) return
  
  console.group(`❌ ${toolName} 执行${attempt ? ` (尝试${attempt})` : ''}失败`)
  console.log(`  Code: ${error.code}`)
  console.log(`  Type: ${error.type}`)
  console.log(`  Recoverable: ${error.recoverable}`)
  console.log(`  Message: ${error.message}`)
  if (error.suggestion) console.log(`  Suggestion: ${error.suggestion}`)
  if (error.retryAfter) console.log(`  Retry After: ${error.retryAfter}s`)
  console.groupEnd()
}

function logRetry(toolName: string, attempt: number, delay: number) {
  if (!DEBUG) return
  console.log(`🔄 ${toolName} 第${attempt}次重试，等待${delay}ms`)
}

结语

你在开发 AI 应用时遇到过哪些棘手的错误处理场景？欢迎在评论区分享你的经验和解决方案！

对于文章中错误的地方或有任何疑问，欢迎在评论区留言讨论！