如何在 Spring Boot 应用中配置多个 Spring AI 的 LLM 客户端

1. 概述

越来越多的现代应用开始集成大型语言模型（LLM），以构建更智能的功能。如何使用Spring AI快速整合LLM能力到自己的Spring Boot应用，在之前的博文中有过很多篇关于使用Spring AI使用不同供应商LLM的整合案例。虽然一个 LLM 能胜任多种任务，但只依赖单一模型并不总是最优。

不同模型各有侧重：有的擅长技术分析，有的更适合创意写作。简单任务更适合轻量、性价比高的模型；复杂任务则交给更强大的模型。

本文将演示如何借助 Spring AI，在 Spring Boot 应用中集成多个 LLM。

我们既会配置来自不同供应商的模型，也会配置同一供应商下的多个模型。随后基于这些配置，构建一个具备弹性的聊天机器人，在故障时可自动在模型间切换。

2. 配置不同供应商的 LLM

我们先在应用中配置来自不同供应商的两个 LLM。

在本文示例中，我们将使用 OpenAI 和 Anthropic 作为 AI 模型提供商。

2.1. 配置主 LLM

我们先将一个 OpenAI 模型配置为主 LLM。

首先，在项目的 pom.xml 文件中添加所需依赖：

xml 复制代码

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
    <version>1.0.2</version>
</dependency>

该 OpenAI Starter 依赖是对 OpenAI Chat Completions API 的封装，使我们能够在应用中与 OpenAI 模型交互。

接着，在 application.yaml 中配置我们的 OpenAI API Key 和聊天模型：

yaml 复制代码

spring:
  ai:
    open-ai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: ${PRIMARY_LLM}
          temperature: 1

我们使用 ${} 属性占位符从环境变量中加载属性值。另外，我们将温度设置为 1，因为较新的 OpenAI 模型只接受这个默认值。

在完成上述属性配置后，Spring AI 会自动创建一个 OpenAiChatModel 类型的 bean。我们使用它来定义一个 ChatClient bean，作为与 LLM 交互的主要入口：

java 复制代码

@Configuration
class ChatbotConfiguration {

    @Bean
    @Primary
    ChatClient primaryChatClient(OpenAiChatModel chatModel) {
        return ChatClient.create(chatModel);
    }
}

在 ChatbotConfiguration 类中，我们使用 OpenAiChatModel bean 创建了主 LLM 的 ChatClient。

我们使用 @Primary 注解标记该 bean。当在组件中注入 ChatClient 且未使用 Qualifier 时，Spring Boot 会自动注入它。

2.2. 配置次级 LLM

现在，我们将配置一个来自 Anthropic 的模型作为次级 LLM。

首先，在 pom.xml 中添加 Anthropic Starter 依赖：

xml 复制代码

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-anthropic</artifactId>
    <version>1.0.2</version>
</dependency>

该依赖是对 Anthropic Message API 的封装，提供了与 Anthropic 模型建立连接并交互所需的类。

接着，为次级模型定义配置属性：

yaml 复制代码

spring:
  ai:
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
      chat:
        options:
          model: ${SECONDARY_LLM}

与主 LLM 的配置类似，我们从环境变量中加载 Anthropic API Key 和模型 ID。

最后，为次级模型创建一个专用的 ChatClient bean：

java 复制代码

@Bean
ChatClient secondaryChatClient(AnthropicChatModel chatModel) {
    return ChatClient.create(chatModel);
}

这里，我们使用 Spring AI 自动配置的 AnthropicChatModel bean 创建了 secondaryChatClient。

3. 配置同一供应商的多个 LLM

很多时候，我们需要配置的多个 LLM 可能来自同一 AI 供应商。

Spring AI 并不原生支持这种场景，其自动配置每个供应商只会创建一个 ChatModel bean 。因此，对于额外的模型，我们需要手动定义 ChatModel bean。

让我们来看看具体过程，并在应用中配置第二个 Anthropic 模型：

yaml 复制代码

spring:
  ai:
    anthropic:
      chat:
        options:
          tertiary-model: ${TERTIARY_LLM}

在 application.yaml 的 Anthropic 配置下，我们添加了一个自定义属性来保存第三个（tertiary）LLM 的模型名称。

接着，为第三个 LLM 定义必要的 bean：

java 复制代码

@Bean
ChatModel tertiaryChatModel(
    AnthropicApi anthropicApi,
    AnthropicChatModel anthropicChatModel,
    @Value("${spring.ai.anthropic.chat.options.tertiary-model}") String tertiaryModelName
) {
    AnthropicChatOptions chatOptions = anthropicChatModel.getDefaultOptions().copy();
    chatOptions.setModel(tertiaryModelName);
    return AnthropicChatModel.builder()
      .anthropicApi(anthropicApi)
      .defaultOptions(chatOptions)
      .build();
}

@Bean
ChatClient tertiaryChatClient(@Qualifier("tertiaryChatModel") ChatModel tertiaryChatModel) {
    return ChatClient.create(tertiaryChatModel);
}

首先，为创建自定义的 ChatModel bean，我们注入自动配置的 AnthropicApi bean、用于创建次级 LLM 的默认 AnthropicChatModel bean，并通过 @Value 注入第三个模型的名称属性。

我们复制现有 AnthropicChatModel 的默认选项，并仅覆盖其中的模型名称。

该设置假定两个 Anthropic 模型共享同一个 API Key 及其他配置。如果需要不同的属性，可以进一步自定义 AnthropicChatOptions。

最后，我们使用自定义的 tertiaryChatModel 在配置类中创建第三个 ChatClient bean。

4. 探索一个实用用例

在完成多模型配置后，让我们实现一个实用用例。我们将构建一个具备弹性的聊天机器人，当主模型出现故障时可按顺序自动回退到替代模型。

4.1. 构建具备弹性的聊天机器人

为实现回退逻辑，我们将使用 Spring Retry。

创建一个新的 ChatbotService 类，并注入我们定义的三个 ChatClient。接着，定义一个入口方法使用主 LLM：

java 复制代码

@Retryable(retryFor = Exception.class, maxAttempts = 3)
String chat(String prompt) {
    logger.debug("Attempting to process prompt '{}' with primary LLM. Attempt #{}",
        prompt, RetrySynchronizationManager.getContext().getRetryCount() + 1);
    return primaryChatClient
      .prompt(prompt)
      .call()
      .content();
}

这里，我们创建了一个使用 primaryChatClient 的 chat() 方法 。该方法使用 @Retryable 注解，在遇到任意 Exception 时最多重试三次。

接着，定义一个恢复方法：

java 复制代码

@Recover
String chat(Exception exception, String prompt) {
    logger.warn("Primary LLM failure. Error received: {}", exception.getMessage());
    logger.debug("Attempting to process prompt '{}' with secondary LLM", prompt);
    try {
        return secondaryChatClient
          .prompt(prompt)
          .call()
          .content();
    } catch (Exception e) {
        logger.warn("Secondary LLM failure: {}", e.getMessage());
        logger.debug("Attempting to process prompt '{}' with tertiary LLM", prompt);
        return tertiaryChatClient
          .prompt(prompt)
          .call()
          .content();
    }
}

使用 @Recover 注解标记的重载 chat() 方法将作为原始 chat() 方法失败并耗尽重试后的回退处理。

我们首先尝试通过 secondaryChatClient 获取响应；如果仍失败，则最后再尝试使用 tertiaryChatClient。

这里使用了简单的 try-catch 实现，因为 Spring Retry 每个方法签名只允许一个恢复方法。但在生产应用中，我们应考虑使用更完善的方案，例如 Resilience4j。

在完成服务层实现后，我们再对外暴露一个 REST API：

java 复制代码

@PostMapping("/api/chatbot/chat")
ChatResponse chat(@RequestBody ChatRequest request) {
    String response = chatbotService.chat(request.prompt);
    return new ChatResponse(response);
}

record ChatRequest(String prompt) {}
record ChatResponse(String response) {}

这里定义了一个 POST 接口 /api/chatbot/chat，接收 prompt，将其传递到服务层，最后把 response 包装在 ChatResponse record 中返回。

4.2. 测试我们的聊天机器人

最后，我们来测试聊天机器人，验证回退机制是否正常工作。

通过环境变量启动应用：为主、次级 LLM 设置无效模型名称，同时为第三个 LLM 设置一个有效的模型名称：

bash 复制代码

OPENAI_API_KEY=.... \
ANTHROPIC_API_KEY=.... \
PRIMARY_LLM=gpt-100 \
SECONDARY_LLM=claude-opus-200 \
TERTIARY_LLM=claude-3-haiku-20240307 \
mvn spring-boot:run

在上述命令中，gpt-100 和 claude-opus-200 是无效的模型名称，会导致 API 错误；而 claude-3-haiku-20240307 是 Anthropic 提供的有效模型。

接着，使用 HTTPie CLI 调用接口，与聊天机器人交互：

bash 复制代码

http POST :8080/api/chatbot/chat prompt="What is the capital of France?"

这里我们向聊天机器人发送一个简单的提示词，看看返回结果：

java 复制代码

{
    "response": "The capital of France is Paris."
}

可以看到，尽管主、次级 LLM 的配置为无效模型，聊天机器人仍返回了正确响应，这验证了系统成功回退到了第三个 LLM。

为了更直观地看到回退逻辑的执行过程，我们再来看一下应用日志：

bash 复制代码

[2025-09-30 12:56:03] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with primary LLM. Attempt #1
[2025-09-30 12:56:05] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with primary LLM. Attempt #2
[2025-09-30 12:56:06] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with primary LLM. Attempt #3
[2025-09-30 12:56:07] [WARN] [com.baeldung.multillm.ChatbotService] - Primary LLM failure. Error received: HTTP 404 - {
    "error": {
        "message": "The model `gpt-100` does not exist or you do not have access to it.",
        "type": "invalid_request_error",
        "param": null,
        "code": "model_not_found"
    }
}
[2025-09-30 12:56:07] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with secondary LLM
[2025-09-30 12:56:07] [WARN] [com.baeldung.multillm.ChatbotService] - Secondary LLM failure: HTTP 404 - {"type":"error","error":{"type":"not_found_error","message":"model: claude-opus-200"},"request_id":"req_011CTeBrAY8rstsSPiJyv3sj"}
[2025-09-30 12:56:07] [DEBUG] [com.baeldung.multillm.ChatbotService] - Attempting to process prompt 'What is the capital of France?' with tertiary LLM

日志清晰地展示了请求的执行流程。

可以看到，主 LLM 连续三次尝试失败；随后服务尝试使用次级 LLM，仍然失败；最终调用第三个 LLM 处理提示词并返回了我们看到的响应。

这表明回退机制按设计正常工作，即使多个 LLM 同时失败，聊天机器人仍保持可用。

5. 小结

本文探讨了如何在单个 Spring AI 应用中集成多个 LLM。首先，我们演示了 Spring AI 的抽象层如何简化来自不同供应商（如 OpenAI 与 Anthropic）的模型配置。随后，我们解决了更复杂的场景：在同一供应商下配置多个模型，并在 Spring AI 的自动配置不够用时创建自定义 bean。最后，我们利用多模型配置构建了一个具有高可用性的弹性聊天机器人。借助 Spring Retry，我们实现了级联回退模式，在发生故障时可在不同 LLM 间自动切换。