
调用外部大模型 API 时,什么网络抖动、限流、服务端 500......简直家常便饭。如果代码一遇到错误就直接把堆栈甩给用户,体验基本为零。本文就从最基础的异常捕获讲起,一路到重试、熔断降级,帮大家把这套防御体系搭完整。
一、先认清"敌人":常见错误类型
面对错误,首先要搞清楚是什么错,才能对症下药。我用一张表把常见情况归好类:
| HTTP 状态码 | 错误类型 | 处理方式 |
|---|---|---|
| 401 | API Key 无效或过期 | 报警,不重试 |
| 403 | 无权限(未开通某模型) | 报警,不重试 |
| 429 | 限流(请求太频繁) | 等待后重试,或切备用模型 |
| 500/502/503 | 服务端错误 | 重试 1-3 次 |
| 超时 | 网络或模型响应太慢 | 重试,或降级 |
| 余额不足 | API 账户额度用完 | 报警,切备用 |
Spring AI 中,这些 HTTP 错误会被包装成 HttpClientErrorException 或 HttpServerErrorException 的子类,结构清晰,分支处理起来很方便。

二、基础异常处理:别再用一个大 catch 吞一切
此方法不优雅,了解下即可:
最基础的一层:根据异常类型分别处理,别一个 catch (Exception e) 把所有情况都混在一起。

@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient chatClient) {
this.chatClient = chatClient;
}
@GetMapping("/chat")
public ResponseEntity<String> chat(@RequestParam String prompt) {
try {
String response = chatClient.call(prompt);
return ResponseEntity.ok(response);
} catch (HttpClientErrorException.Unauthorized e) {
// 401,告警
log.error("API Key 无效", e);
return ResponseEntity.status(401).body("认证失败,请检查 API Key");
} catch (HttpClientErrorException.Forbidden e) {
// 403
log.error("无权限访问", e);
return ResponseEntity.status(403).body("无权使用该模型");
} catch (HttpClientErrorException.TooManyRequests e) {
// 429,重试或切备用
log.warn("触发限流");
return ResponseEntity.status(429).body("请求太频繁,请稍后重试");
} catch (HttpServerErrorException e) {
// 5xx,重试
log.error("服务端错误", e);
return ResponseEntity.status(502).body("AI 服务暂时不可用");
}
}
}
但这只能做到"有错就报",对于 429 和 5xx 这类偶发性错误,更优雅的做法是等一等再试。
三、重试机制:给请求多一次机会

方案一:Spring Retry(推荐,声明式重试)
引入依赖:
<dependency>
<groupId>org.springframework.retry</groupId>
<artifactId>spring-retry</artifactId>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-aspects</artifactId>
</dependency>
启动类加 @EnableRetry:
@SpringBootApplication
@EnableRetry
public class AiApplication {
//...
}
在需要重试的 Service 方法上加 @Retryable,失败全部由 @Recover 兜底:
@Service
public class AiService {
private final ChatClient chatClient;
public AiService(ChatClient chatClient) {
this.chatClient = chatClient;
}
@Retryable(
retryFor = {HttpServerErrorException.class, HttpClientErrorException.TooManyRequests.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public String callWithRetry(String prompt) {
return chatClient.call(prompt);
}
@Recover
public String recover(Exception e, String prompt) {
log.error("重试全部失败,降级处理", e);
return "AI 服务暂时不可用,请稍后重试";
}
}
Controller 只需调用 Service 即可:
@GetMapping("/chat-retry")
public ResponseEntity<String> chatRetry(@RequestParam String prompt) {
String result = aiService.callWithRetry(prompt);
return ResponseEntity.ok(result);
}
方案二:手动重试,处理 Retry-After 头

某些 API 在返回 429 时,响应头里会有 Retry-After,告诉你需要等多少秒。@Retryable 无法直接读取响应头,这时手动循环更合适:
package com.jichi.springaialibaba.service;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;
import org.springframework.web.client.HttpClientErrorException;
import org.springframework.web.client.HttpServerErrorException;
@Service
public class ManualRetryChatService {
private static final Logger log = LoggerFactory.getLogger(ManualRetryChatService.class);
private final ChatClient chatClient;
public ManualRetryChatService(@Qualifier("primaryChatClient") ChatClient chatClient) {
this.chatClient = chatClient;
}
public String chatWithManualRetry(String message) {
int maxAttempts = 3;
long delayMs = 1000;
for (int attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return chatClient.prompt()
.user(message)
.call()
.content();
} catch (HttpClientErrorException.TooManyRequests e) {
if (attempt == maxAttempts) {
throw new RuntimeException("请求频率超限,请稍后再试", e);
}
// 优先读响应头里的 Retry-After,没有就用默认等待时间
String retryAfter = e.getResponseHeaders() != null
? e.getResponseHeaders().getFirst("Retry-After") : null;
long waitMs = retryAfter != null ? Long.parseLong(retryAfter) * 1000 : delayMs;
log.warn("触发限流,等待 {}ms 后重试(第 {}/{} 次)", waitMs, attempt, maxAttempts);
sleep(waitMs);
} catch (HttpServerErrorException e) {
if (attempt == maxAttempts) throw new RuntimeException("AI 服务异常", e);
log.warn("服务端错误,{}ms 后重试(第 {}/{} 次)", delayMs, attempt, maxAttempts);
sleep(delayMs);
delayMs *= 2; // 指数退避
} catch (Exception e) {
throw new RuntimeException("AI 调用失败", e);
}
}
return "AI 服务暂时不可用";
}
private void sleep(long ms) {
try {
Thread.sleep(ms);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RuntimeException("重试被中断", e);
}
}
}
对应的 Controller:
package com.jichi.springaialibaba.controller;
import com.jichi.springaialibaba.service.ManualRetryChatService;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/manual-retry")
public class ManualRetryChatController {
private final ManualRetryChatService manualRetryChatService;
public ManualRetryChatController(ManualRetryChatService manualRetryChatService) {
this.manualRetryChatService = manualRetryChatService;
}
@GetMapping
public String chat(@RequestParam String message) {
return manualRetryChatService.chatWithManualRetry(message);
}
}
四、熔断降级:避免雪崩效应
重试只能应对偶发问题。如果模型 API 持续不可用(比如连续 5 分钟),每次还重试 3 次,会严重拖慢整个系统。这时候需要熔断器------错误率超过阈值就直接快速失败,不再调用真实 API,等一段时间后自动尝试恢复。

Resilience4j 是 Spring 生态中的首选。引入依赖:
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
<version>2.3.0</version>
</dependency>

配置熔断器:
resilience4j:
circuitbreaker:
instances:
aiService:
sliding-window-size: 10 # 统计最近 10 次调用
failure-rate-threshold: 50 # 失败率超过 50% 触发熔断
wait-duration-in-open-state: 30s # 熔断后等待 30s 再尝试半开
permitted-number-of-calls-in-half-open-state: 3 # 半开状态测试 3 次
retry:
instances:
aiService:
max-attempts: 3
wait-duration: 1s
retry-exceptions:
- org.springframework.web.client.HttpServerErrorException

package com.jichi.springaialibaba.service;
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import io.github.resilience4j.retry.annotation.Retry;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;
@Service
public class ResilientChatService {
private static final Logger log = LoggerFactory.getLogger(ResilientChatService.class);
private final ChatClient primaryChatClient;
private final ChatClient backupChatClient;
public ResilientChatService(
@Qualifier("primaryChatClient") ChatClient primaryChatClient,
@Qualifier("backupChatClient") ChatClient backupChatClient) {
this.primaryChatClient = primaryChatClient;
this.backupChatClient = backupChatClient;
}
/**
* 先重试,重试都失败后触发熔断,熔断后走降级方法
*/
@CircuitBreaker(name = "aiService", fallbackMethod = "fallbackChat")
@Retry(name = "aiService")
public String chat(String message) {
return primaryChatClient.prompt()
.user(message)
.call()
.content();
}
/**
* 降级方法:主模型熔断时切换到备用模型
* 签名必须和原方法一致,最后加一个 Throwable 参数
*/
public String fallbackChat(String message, Throwable throwable) {
log.warn("主模型不可用({}),切换备用模型", throwable.getMessage());
try {
return backupChatClient.prompt()
.user(message)
.call()
.content();
} catch (Exception e) {
log.error("备用模型也不可用", e);
return "AI 服务暂时不可用,请稍后重试。如有紧急需求,请联系客服。";
}
}
}
对应的 Controller:
package com.jichi.springaialibaba.controller;
import com.jichi.springaialibaba.service.ResilientChatService;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/resilient")
public class ResilientChatController {
private final ResilientChatService resilientChatService;
public ResilientChatController(ResilientChatService resilientChatService) {
this.resilientChatService = resilientChatService;
}
@GetMapping
public String chat(@RequestParam String message) {
return resilientChatService.chat(message);
}
}
五、全局异常处理:让 Controller 清爽起来
前面我们在 Controller 里写了不少 try-catch,重复又难看。用 @RestControllerAdvice 统一拦截 AI 相关异常,返回统一的错误结构。
package com.jichi.springaialibaba.exception;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
import org.springframework.web.client.HttpClientErrorException;
import org.springframework.web.client.HttpServerErrorException;
import java.util.concurrent.TimeoutException;
@RestControllerAdvice
public class AiExceptionHandler {
private static final Logger log = LoggerFactory.getLogger(AiExceptionHandler.class);
record ErrorResponse(String code, String message) {}
@ExceptionHandler(HttpClientErrorException.Unauthorized.class)
public ResponseEntity<ErrorResponse> handleUnauthorized(HttpClientErrorException.Unauthorized e) {
log.error("API Key 无效", e);
return ResponseEntity.status(HttpStatus.UNAUTHORIZED)
.body(new ErrorResponse("UNAUTHORIZED", "API Key 无效或已过期"));
}
@ExceptionHandler(HttpClientErrorException.Forbidden.class)
public ResponseEntity<ErrorResponse> handleForbidden(HttpClientErrorException.Forbidden e) {
log.error("权限不足", e);
return ResponseEntity.status(HttpStatus.FORBIDDEN)
.body(new ErrorResponse("FORBIDDEN", "无权访问该模型"));
}
@ExceptionHandler(HttpClientErrorException.TooManyRequests.class)
public ResponseEntity<ErrorResponse> handleRateLimit(HttpClientErrorException.TooManyRequests e) {
return ResponseEntity.status(HttpStatus.TOO_MANY_REQUESTS)
.body(new ErrorResponse("RATE_LIMIT", "请求过于频繁,请稍后再试"));
}
@ExceptionHandler(HttpServerErrorException.class)
public ResponseEntity<ErrorResponse> handleServerError(HttpServerErrorException e) {
log.error("AI 服务端错误", e);
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(new ErrorResponse("AI_SERVICE_ERROR", "AI 服务暂时不可用"));
}
@ExceptionHandler(TimeoutException.class)
public ResponseEntity<ErrorResponse> handleTimeout(TimeoutException e) {
return ResponseEntity.status(HttpStatus.GATEWAY_TIMEOUT)
.body(new ErrorResponse("TIMEOUT", "AI 响应超时,请重试"));
}
}
这样一来,Controller 里只需要专注业务逻辑,异常全被统一处理,代码干净多了。
六、总结
一套完整的 AI 调用防御体系,应该分层考虑:
-
基础异常处理:分支响应不同类型错误
-
重试:偶发性错误的即时补偿(Spring Retry 声明式 / 手动读取 Retry-After)
-
熔断降级:长期故障下的快速失败(Resilience4j)
-
超时控制:避免线程阻塞
-
全局异常拦截:统一对外输出
把这套组合拳加到你的项目里,AI 调用就能从"一碰就碎"变成"稳如老狗"。鸡哥博客里还会持续更新更多实战细节,欢迎继续关注!