Spring AI 2.0 多模型提供商配置：OpenAI、Gemini、Anthropic 与 Ollama 深度集成

前言

从 Spring Boot 3.x 升级到 4.0，一个重要的生态变化就是 Spring AI 也跟随进入了 2.0 时代。如果说 1.x 版本是 LLM 集成的开路先锋，那么 2.0 就是在多模型提供商生态中的精细化打磨。

作为企业架构师，我们每天面临这样的问题：选择哪个大模型提供商？配置差异如何处理？如何在不同模型间平滑切换？ 这不仅是技术问题，更是成本优化、避免供应商锁定的战略问题。

本文基于 Spring AI 2.0-M1/M2 发布版本，从实战角度详细剖析 OpenAI、Google Gemini、Anthropic Claude 和 Ollama 这四大主流提供商的配置差异与集成策略。

一、Spring AI 2.0 发布要点

1.1 版本基线变化

Spring AI 2.0-M1/M2 引入了几个重大变化：

yaml 复制代码

# Spring AI 2.0 依赖基线
Spring Boot: 4.0 GA+
Spring Framework: 7.0+
Java: 21 强制要求（虚拟线程、AOT编译）

# 新增模型支持
Models:
  - OpenAI: GPT-5-mini（新默认模型，性能成本比最优）
  - Google: Gemini 1.5 Pro（Thinking Mode支持）
  - Anthropic: Claude 3.5 Sonnet、Claude 4（工具调用升级）
  - Ollama: 本地完全私有化方案

# 向量存储扩展
VectorStore:
  - Amazon S3 PostgreSQL
  - Infinispan（企业级缓存）
  - Amazon Bedrock Knowledge Base

1.2 为什么要关注多模型配置

工业界现状：

场景	最优模型	成本差异
快速文本生成	GPT-5-mini	$0.15/M tokens
复杂推理	Gemini 1.5 Pro + Thinking	$3-5/M tokens
工具调用	Claude 3.5 Sonnet	$3/M tokens
本地私有化	Ollama Mistral/Llama2	0（自托管）

关键洞察：没有绝对的最优模型，只有最优的选型策略。掌握跨模型配置能力，就掌握了成本优化和灵活切换的主动权。

二、Spring AI 核心配置架构

2.1 配置分层模型

复制代码

┌──────────────────────────────────────────────────────────┐
│   Application Layer (业务代码)                              │
│   - ChatModel.call(prompt) 调用统一接口                     │
└──────────────────────────────────────────────────────────┘
                           ↓
┌──────────────────────────────────────────────────────────┐
│   Portable Options Layer (跨模型适配)                       │
│   - temperature, topP, maxTokens (Double统一类型)           │
│   - 在这层抹平不同模型的参数差异                             │
└──────────────────────────────────────────────────────────┘
                           ↓
┌──────────────────────────────────────────────────────────┐
│   Model-Specific Options Layer (模型特化)                  │
│   - OpenAIChatOptions (ResponseFormat, StreamOptions等)   │
│   - GeminiChatOptions (ThinkingLevel, SafetySettings等)   │
│   - AnthropicChatOptions (tools, metadata等)              │
│   - OllamaChatOptions (embed_model, embed_dim等)          │
└──────────────────────────────────────────────────────────┘
                           ↓
┌──────────────────────────────────────────────────────────┐
│   Client Layer (SDK适配)                                   │
│   - OpenAI Client SDK                                     │
│   - Google Vertex AI SDK                                  │
│   - Anthropic SDK                                         │
│   - Ollama HTTP API                                       │
└──────────────────────────────────────────────────────────┘

2.2 Portable Options 的设计哲学

Portable Options 是 Spring AI 2.0 的核心创新。它通过以下机制实现跨模型兼容：

java 复制代码

// 跨模型统一接口（ChatOptions.java）
public interface ChatOptions {
    Double getTemperature();        // Double类型，所有模型支持
    Double getTopP();               // Double类型，所有模型支持
    Integer getMaxTokens();         // 所有模型都有token限制
    String getModel();              // 模型名称
    List<String> getStopSequences();// 停止序列，多模型通用
}

// ChatOptionsBuilder 提供流式配置
ChatOptions options = ChatOptions.builder()
    .withTemperature(0.7)          // 0.0-1.0，创意度控制
    .withTopP(0.9)                 // nucleus sampling
    .withMaxTokens(2048)           // 生成token上限
    .build();

为什么用 Double？ 不同模型的参数类型不一致：

OpenAI: temperature 是 Float (0-2)
Anthropic: temperature 是 Float (0-1)
Gemini: temperature 是 Double (0-2)

Spring AI 采用 Double 作为内部统一类型，然后在各模型适配层进行转换。这是典型的适配器模式应用。

三、OpenAI 配置深度剖析

3.1 依赖声明

xml 复制代码

<!-- pom.xml -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.0.0</version>
</dependency>

3.2 配置属性

yaml 复制代码

# application-openai.properties
spring:
  ai:
    openai:
      api-key: sk-proj-xxx  # 从 https://platform.openai.com/api-keys 获取
      base-url: https://api.openai.com/v1  # 支持反向代理配置
      
      # 根据场景选择模型
      chat:
        enabled: true
        options:
          model: gpt-5-mini           # 2.0推荐默认（成本/性能最优）
          # model: gpt-4-turbo        # 复杂推理
          # model: gpt-4              # 长上下文
          temperature: 0.7            # 创意度：0=确定性，1=随机
          top-p: 0.9                  # nucleus sampling
          presence-penalty: 0.0       # 避免重复话题（-2到2）
          frequency-penalty: 0.0      # 降低词频（-2到2）
          max-tokens: 2048            # 输出token数
          
          # 高级特性
          response-format:
            type: json_object         # 强制JSON格式输出
          
          stream-options:
            include-usage: true       # 流式响应包含token计数

3.3 核心编程接口

java 复制代码

// 1. 依赖注入ChatModel
@RestController
public class OpenAIController {
    
    private final ChatModel chatModel;
    
    @Autowired
    public OpenAIController(ChatModel chatModel) {
        this.chatModel = chatModel;
    }
    
    // 2. 基础调用：使用全局配置
    @GetMapping("/simple")
    public String simple(@RequestParam String message) {
        return chatModel.call(message);
    }
    
    // 3. 高级调用：运行时覆盖配置
    @PostMapping("/advanced")
    public ChatResponse advanced(@RequestBody QueryRequest req) {
        
        // 使用 OpenAI 特化的选项
        OpenAIChatOptions options = OpenAIChatOptions.builder()
            .withModel("gpt-4-turbo")                      // 运行时切换模型
            .withTemperature(0.3)                         // 更高精度
            .withResponseFormat(new OpenAIResponseFormat(
                OpenAIResponseFormat.Type.JSON_OBJECT))   // 强制JSON
            .withTools(buildFunctionTools())              // 工具定义
            .build();
        
        Prompt prompt = new Prompt(req.getContent(), options);
        return chatModel.call(prompt);
    }
    
    // 4. 工具调用（Function Calling）
    @PostMapping("/with-tools")
    public ChatResponse withTools(@RequestBody QueryRequest req) {
        
        // 定义工具
        List<ToolCallbackProvider> toolCallbacks = List.of(
            new WeatherToolCallbackProvider(),
            new CalculatorToolCallbackProvider()
        );
        
        OpenAIChatOptions options = OpenAIChatOptions.builder()
            .withModel("gpt-5-mini")
            .withTools(
                toolCallbacks.stream()
                    .map(p -> p.getToolDefinition())
                    .collect(Collectors.toList())
            )
            .withToolChoice("auto")  // 让模型决定调用哪个工具
            .build();
        
        Prompt prompt = new Prompt(req.getContent(), options);
        ChatResponse response = chatModel.call(prompt);
        
        // 处理工具调用结果
        processToolCalls(response, toolCallbacks);
        
        return response;
    }
    
    // 5. 流式响应
    @GetMapping("/stream")
    public Flux<String> stream(@RequestParam String message) {
        Prompt prompt = new Prompt(message);
        
        return chatModel.stream(prompt)
            .map(response -> response.getResult().getOutput().getContent());
    }
}

// 工具定义示例
public class WeatherToolCallbackProvider implements ToolCallbackProvider {
    
    @Override
    public FunctionDefinition getToolDefinition() {
        return FunctionDefinition.builder()
            .name("get_weather")
            .description("Get weather information for a city")
            .inputTypeSchema(JsonSchemaBuilder.builder()
                .withProperties(Map.of(
                    "city", JsonSchemaProperty.builder()
                        .withType("string")
                        .withDescription("City name")
                        .build()
                ))
                .withRequired(List.of("city"))
                .build())
            .build();
    }
    
    @Override
    public Object invoke(String input) {
        // 实现实际的天气查询
        return "{\"weather\": \"sunny\", \"temp\": 25}";
    }
}

3.4 成本优化建议

OpenAI 定价结构（2026年）：

模型	输入	输出	适用场景
gpt-5-mini	$0.075/M	$0.3/M	首选：成本最优
gpt-4-turbo	$10/M	$30/M	复杂推理
gpt-4	$30/M	$60/M	长上下文

java 复制代码

// 成本优化策略：动态模型选择
public class CostOptimizedChatService {
    
    private final ChatModel chatModel;
    
    /**
     * 根据输入复杂度选择模型
     */
    public ChatResponse callWithOptimization(String userMessage) {
        
        int estimatedComplexity = estimateComplexity(userMessage);
        
        OpenAIChatOptions options = OpenAIChatOptions.builder()
            .withModel(selectModelByComplexity(estimatedComplexity))
            .withTemperature(0.5)
            .build();
        
        Prompt prompt = new Prompt(userMessage, options);
        return chatModel.call(prompt);
    }
    
    private String selectModelByComplexity(int complexity) {
        if (complexity < 3) {
            return "gpt-5-mini";      // 简单问答，成本最优
        } else if (complexity < 7) {
            return "gpt-4-turbo";     // 中等复杂度
        } else {
            return "gpt-4";           // 高度复杂推理
        }
    }
    
    private int estimateComplexity(String message) {
        // 简化计算：问号数量、字数、特定关键词
        int questionMarks = (int) message.chars().filter(c -> c == '?').count();
        int words = message.split("\\s+").length;
        return Math.min(10, questionMarks + words / 100);
    }
}

四、Google Gemini 配置实战

4.1 依赖声明

xml 复制代码

<!-- Vertex AI（谷歌云托管方案）-->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-vertex-ai-gemini-spring-boot-starter</artifactId>
    <version>1.0.0</version>
</dependency>

<!-- 或者使用 Google AI API（免费试用）-->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-google-ai-gemini-spring-boot-starter</artifactId>
    <version>1.0.0</version>
</dependency>

4.2 配置属性

yaml 复制代码

# Vertex AI 配置（企业级）
spring:
  ai:
    vertex:
      ai:
        gemini:
          project-id: my-gcp-project
          location: us-central1              # 区域配置
          credentials-uri: file:///path/to/credentials.json
          api-endpoint: https://us-central1-aiplatform.googleapis.com
          transport: GRPC                    # GRPC 性能优于 REST
          
          chat:
            enabled: true
            options:
              model: gemini-2.0-flash         # 2.0新模型，速度快
              # model: gemini-1.5-pro         # 长上下文（100K tokens）
              temperature: 1.0                # Gemini建议默认1.0
              top-p: 0.95
              max-output-tokens: 8192
              
              # Thinking Mode（推理模式，新特性）
              thinking-level: medium          # DISABLED / LOW / MEDIUM / HIGH
              
              # 安全设置
              safety-settings:
                - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
                  threshold: BLOCK_ONLY_HIGH

# Google AI API 配置（开发/试用）
spring:
  ai:
    google:
      ai:
        gemini:
          api-key: AIzaSyXxx                  # 从 https://aistudio.google.com/app/apikey
          
          chat:
            enabled: true
            options:
              model: gemini-2.0-flash
              temperature: 1.0
              top-p: 0.95
              max-output-tokens: 8192

4.3 编程模式

java 复制代码

@RestController
public class GeminiController {
    
    private final ChatModel chatModel;
    
    @Autowired
    public GeminiController(ChatModel chatModel) {
        this.chatModel = chatModel;
    }
    
    // 1. 基础调用
    @GetMapping("/simple")
    public String simple(@RequestParam String prompt) {
        return chatModel.call(prompt);
    }
    
    // 2. Thinking Mode：适合复杂推理任务
    @PostMapping("/complex-reasoning")
    public ChatResponse complexReasoning(@RequestBody QueryRequest req) {
        
        GeminiChatOptions options = GeminiChatOptions.builder()
            .withModel("gemini-2.0-pro")     // Pro版本支持Thinking
            .withThinkingLevel(ThinkingLevel.MEDIUM)  // 推理深度
            .withTemperature(0.7)
            .withMaxOutputTokens(8192)
            .build();
        
        Prompt prompt = new Prompt(req.getContent(), options);
        ChatResponse response = chatModel.call(prompt);
        
        // 提取推理过程（可选）
        // response.getMetadata().get("thinkingProcess")
        
        return response;
    }
    
    // 3. 多模态输入：图像理解
    @PostMapping("/vision")
    public ChatResponse visionUnderstanding(@RequestParam String imageUrl) {
        
        // 构建多模态消息
        UserMessage userMessage = new UserMessage(
            "Please describe this image in detail",
            List.of(new Image(
                new URL(imageUrl),
                MediaType.IMAGE_JPEG
            ))
        );
        
        Prompt prompt = new Prompt(List.of(userMessage));
        return chatModel.call(prompt);
    }
    
    // 4. 长上下文处理（Gemini 1.5 Pro 支持 100K tokens）
    @PostMapping("/long-context")
    public ChatResponse longContextAnalysis(
            @RequestParam String documentPath,
            @RequestParam String question) throws Exception {
        
        // 读取长文档（可以是整本书）
        String documentContent = Files.readString(
            Paths.get(documentPath)
        );
        
        String combinedPrompt = String.format(
            "Document:\n%s\n\nQuestion: %s",
            documentContent,
            question
        );
        
        GeminiChatOptions options = GeminiChatOptions.builder()
            .withModel("gemini-1.5-pro")      // 支持100K长度
            .withMaxOutputTokens(4096)
            .build();
        
        Prompt prompt = new Prompt(combinedPrompt, options);
        return chatModel.call(prompt);
    }
    
    // 5. 安全策略配置
    @PostMapping("/safe-generation")
    public ChatResponse safeGeneration(@RequestBody QueryRequest req) {
        
        List<SafetySetting> safetySettings = List.of(
            SafetySetting.builder()
                .withCategory(HarmCategory.SEXUALLY_EXPLICIT)
                .withThreshold(SafetyThreshold.BLOCK_ONLY_HIGH)
                .build(),
            SafetySetting.builder()
                .withCategory(HarmCategory.VIOLENCE)
                .withThreshold(SafetyThreshold.BLOCK_ONLY_HIGH)
                .build()
        );
        
        GeminiChatOptions options = GeminiChatOptions.builder()
            .withModel("gemini-2.0-flash")
            .withSafetySettings(safetySettings)
            .build();
        
        Prompt prompt = new Prompt(req.getContent(), options);
        return chatModel.call(prompt);
    }
}

4.4 Thinking Level 调优指南

Thinking Mode 是 Gemini 的创新特性，模型可以在生成答案前进行深度推理。

yaml 复制代码

# Thinking Level 选择规则
ThinkingLevel.DISABLED:
  - 场景: 快速回复、简单问答
  - 延迟: < 1秒
  - 成本: 基础价格
  - 示例: "请列举5个城市"

ThinkingLevel.LOW:
  - 场景: 中等难度问题、基础分析
  - 延迟: 1-3秒
  - 成本: +0.1x
  - 示例: "比较A和B的优缺点"

ThinkingLevel.MEDIUM:
  - 场景: 复杂推理、代码生成、问题解决
  - 延迟: 3-8秒
  - 成本: +0.5x
  - 示例: "设计一个电商系统架构"

ThinkingLevel.HIGH:
  - 场景: 极复杂推理、科学计算、多步骤问题
  - 延迟: 8-30秒
  - 成本: +1.0-2.0x
  - 示例: "证明一个复杂的数学定理"

成本优化建议：

java 复制代码

public class GeminiCostOptimizer {
    
    /**
     * 根据问题复杂度智能选择 Thinking Level
     */
    public ChatResponse optimizedCall(String question) {
        
        ThinkingLevel level = analyzeQuestionComplexity(question);
        
        GeminiChatOptions options = GeminiChatOptions.builder()
            .withModel("gemini-2.0-pro")
            .withThinkingLevel(level)  // 动态选择
            .build();
        
        Prompt prompt = new Prompt(question, options);
        return chatModel.call(prompt);
    }
    
    private ThinkingLevel analyzeQuestionComplexity(String question) {
        
        // 快速启发式分析
        boolean isSimpleQuestion = question.length() < 50 
            && !question.contains("why")
            && !question.contains("how");
        
        if (isSimpleQuestion) {
            return ThinkingLevel.DISABLED;  // 节省成本
        }
        
        boolean isComplexReasoning = question.contains("design")
            || question.contains("architect")
            || question.contains("optimize");
        
        if (isComplexReasoning) {
            return ThinkingLevel.MEDIUM;
        }
        
        return ThinkingLevel.LOW;
    }
}

五、Anthropic Claude 配置指南

5.1 依赖声明

xml 复制代码

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
    <version>1.0.0</version>
</dependency>

5.2 配置属性

yaml 复制代码

spring:
  ai:
    anthropic:
      api-key: sk-ant-xxx                    # 从 https://console.anthropic.com/
      base-url: https://api.anthropic.com    # 支持自定义端点
      
      chat:
        enabled: true
        options:
          model: claude-3-5-sonnet-20240620  # 推荐：性能成本比优
          # model: claude-3-opus-20240229    # 高端：最强推理能力
          temperature: 0.8                    # Claude 建议默认 0.8
          top-p: 0.95
          top-k: 40                          # Claude特有参数
          max-tokens: 1024
          
          # 停止序列（Claude推荐配置）
          stop-sequences:
            - "\n\nHuman:"
            - "\n\nAssistant:"

5.3 编程接口

java 复制代码

@RestController
public class AnthropicController {
    
    private final ChatModel chatModel;
    
    @Autowired
    public AnthropicController(ChatModel chatModel) {
        this.chatModel = chatModel;
    }
    
    // 1. 基础调用
    @GetMapping("/basic")
    public String basic(@RequestParam String message) {
        return chatModel.call(message);
    }
    
    // 2. 工具调用（Claude的核心优势）
    @PostMapping("/tool-calling")
    public ChatResponse toolCalling(@RequestBody QueryRequest req) {
        
        // 定义工具集合
        List<ToolDefinition> tools = List.of(
            // 工具1：数据库查询
            ToolDefinition.builder()
                .name("query_database")
                .description("Query the user database for information")
                .inputSchema(Map.of(
                    "type", "object",
                    "properties", Map.of(
                        "query", Map.of(
                            "type", "string",
                            "description", "SQL query to execute"
                        ),
                        "limit", Map.of(
                            "type", "integer",
                            "description", "Max results"
                        )
                    ),
                    "required", List.of("query")
                ))
                .build(),
            
            // 工具2：API调用
            ToolDefinition.builder()
                .name("call_external_api")
                .description("Call an external REST API")
                .inputSchema(Map.of(
                    "type", "object",
                    "properties", Map.of(
                        "endpoint", Map.of("type", "string"),
                        "method", Map.of(
                            "type", "string",
                            "enum", List.of("GET", "POST", "PUT", "DELETE")
                        )
                    ),
                    "required", List.of("endpoint", "method")
                ))
                .build()
        );
        
        AnthropicChatOptions options = AnthropicChatOptions.builder()
            .withModel("claude-3-5-sonnet-20240620")
            .withTemperature(0.5)
            .withTools(tools)
            .build();
        
        Prompt prompt = new Prompt(req.getContent(), options);
        ChatResponse response = chatModel.call(prompt);
        
        // 处理工具调用
        return processToolUsage(response);
    }
    
    private ChatResponse processToolUsage(ChatResponse response) {
        // Claude 会返回 tool_use 内容块
        // 需要识别工具调用并执行对应操作
        
        List<ContentBlock> contentBlocks = response.getMetadata()
            .get("contentBlocks", List.class);
        
        if (contentBlocks == null) {
            return response;
        }
        
        for (ContentBlock block : contentBlocks) {
            if ("tool_use".equals(block.type)) {
                String toolName = block.name;
                String toolInput = block.input;
                
                // 根据工具名称路由到对应处理逻辑
                Object result = executeTool(toolName, toolInput);
                
                // 可以将结果放回到对话流程中
            }
        }
        
        return response;
    }
    
    // 3. 视觉输入支持
    @PostMapping("/vision")
    public ChatResponse visionAnalysis(@RequestParam String imageUrl) {
        
        UserMessage userMessage = new UserMessage(
            "Analyze this image",
            List.of(new Image(
                new URL(imageUrl),
                MediaType.IMAGE_JPEG
            ))
        );
        
        Prompt prompt = new Prompt(List.of(userMessage));
        return chatModel.call(prompt);
    }
    
    // 4. 提示词工程：系统角色设定
    @PostMapping("/role-based")
    public ChatResponse roleBasedResponse(@RequestBody QueryRequest req) {
        
        // Claude 对系统角色敏感
        SystemMessage systemPrompt = new SystemMessage(
            "You are an expert technical architect with 15 years of experience. " +
            "Your responses should be pragmatic, code-first, and production-ready. " +
            "Avoid theoretical discussions. Focus on real-world trade-offs."
        );
        
        UserMessage userMessage = new UserMessage(req.getContent());
        
        AnthropicChatOptions options = AnthropicChatOptions.builder()
            .withModel("claude-3-5-sonnet-20240620")
            .withMaxTokens(2048)
            .withTemperature(0.7)
            .build();
        
        Prompt prompt = new Prompt(
            List.of(systemPrompt, userMessage),
            options
        );
        
        return chatModel.call(prompt);
    }
    
    // 5. 错误处理与重试
    @PostMapping("/with-retry")
    public ChatResponse withRetry(@RequestBody QueryRequest req) {
        
        int maxRetries = 3;
        int retryCount = 0;
        ChatResponse response = null;
        
        while (retryCount < maxRetries) {
            try {
                AnthropicChatOptions options = AnthropicChatOptions.builder()
                    .withModel("claude-3-5-sonnet-20240620")
                    .withTemperature(0.5)
                    .withMaxTokens(2048)
                    .build();
                
                Prompt prompt = new Prompt(req.getContent(), options);
                response = chatModel.call(prompt);
                break;  // 成功则退出
                
            } catch (RateLimitException e) {
                retryCount++;
                if (retryCount < maxRetries) {
                    // 指数退避
                    try {
                        Thread.sleep((long) Math.pow(2, retryCount) * 1000);
                    } catch (InterruptedException ie) {
                        Thread.currentThread().interrupt();
                    }
                } else {
                    throw e;
                }
            }
        }
        
        return response;
    }
}

5.4 Claude 模型选型矩阵

模型	能力	速度	成本	推荐场景
Claude 3.5 Sonnet	8/10	9/10	$3/M	首选：性能成本比最优
Claude 3.5 Haiku	6/10	10/10	$0.8/M	简单任务、大量调用
Claude 3 Opus	10/10	7/10	$15/M	极复杂推理、科研

六、Ollama 本地私有化方案

6.1 依赖声明

xml 复制代码

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0</version>
</dependency>

6.2 本地部署

bash 复制代码

# 1. 安装 Ollama（https://ollama.ai）
# macOS / Windows / Linux 一键安装

# 2. 拉取模型
ollama pull mistral           # 通用模型，性能均衡
ollama pull neural-chat       # 对话优化
ollama pull codellama         # 代码生成
ollama pull llama2            # 开源代表

# 3. 验证服务
curl http://localhost:11434/api/tags

6.3 Spring Boot 配置

yaml 复制代码

spring:
  ai:
    ollama:
      base-url: http://localhost:11434     # 本地或远程地址
      timeout: 120s                        # 长模型可能需要更长时间
      
      chat:
        enabled: true
        options:
          model: mistral                    # 推荐模型
          # model: neural-chat
          # model: codellama
          temperature: 0.7
          top-p: 0.9
          top-k: 40
          repeat-last-n: 64                # Ollama特有参数
          repeat-penalty: 1.1              # 避免重复
          num-predict: 256                 # 最大生成tokens
          
          # 嵌入模型配置
          embedding:
            model: nomic-embed-text        # 嵌入模型
            embed-dim: 768                 # 嵌入维度

6.4 结构化输出与嵌入配置

java 复制代码

@RestController
public class OllamaController {
    
    private final ChatModel chatModel;
    private final EmbeddingModel embeddingModel;
    
    @Autowired
    public OllamaController(ChatModel chatModel, 
                           EmbeddingModel embeddingModel) {
        this.chatModel = chatModel;
        this.embeddingModel = embeddingModel;
    }
    
    // 1. 基础调用（完全私有化）
    @GetMapping("/private-chat")
    public String privateChat(@RequestParam String message) {
        return chatModel.call(message);
    }
    
    // 2. 结构化输出示例
    @PostMapping("/structured-output")
    public UserProfile extractUserProfile(@RequestBody String description) {
        
        String prompt = String.format(
            """
            Extract user information and output as JSON:
            {
                "name": "string",
                "age": "integer",
                "skills": ["array of strings"],
                "experience_years": "integer"
            }
            
            User description: %s
            """,
            description
        );
        
        OllamaChatOptions options = OllamaChatOptions.builder()
            .withModel("mistral")
            .withTemperature(0.3)  // 低温度保证格式一致性
            .withNumPredict(256)
            .build();
        
        Prompt p = new Prompt(prompt, options);
        ChatResponse response = chatModel.call(p);
        
        // 解析JSON
        String jsonOutput = response.getResult().getOutput().getContent();
        return parseUserProfile(jsonOutput);
    }
    
    // 3. 嵌入向量生成
    @PostMapping("/embeddings")
    public EmbeddingResponse generateEmbeddings(@RequestBody List<String> texts) {
        
        // 使用 Ollama 生成嵌入向量
        List<Float> embedding = embeddingModel.embed(texts.get(0));
        
        // embedding 维度 = 768 (nomic-embed-text)
        // 可用于语义搜索、相似度计算、向量数据库存储
        
        return new EmbeddingResponse(
            texts.get(0),
            embedding,
            embedding.size()
        );
    }
    
    // 4. 本地RAG系统
    @PostMapping("/local-rag")
    public ChatResponse localRAG(@RequestBody RAGQuery query) {
        
        // 第一步：查询相关文档（本地向量库）
        List<Document> relatedDocs = vectorStore.search(
            query.getQuestion(),
            embeddingModel
        );
        
        // 第二步：构建增强提示词
        String context = relatedDocs.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n---\n"));
        
        String augmentedPrompt = String.format(
            """
            Context:
            %s
            
            Question: %s
            
            Answer based on the context above:
            """,
            context,
            query.getQuestion()
        );
        
        // 第三步：本地模型生成回复
        return chatModel.call(new Prompt(augmentedPrompt));
    }
    
    // 5. 批量处理
    @PostMapping("/batch-processing")
    public List<String> batchProcessing(@RequestBody List<String> prompts) {
        
        return prompts.parallelStream()
            .map(prompt -> {
                try {
                    OllamaChatOptions options = OllamaChatOptions.builder()
                        .withModel("mistral")
                        .withNumPredict(256)
                        .build();
                    
                    ChatResponse response = chatModel.call(
                        new Prompt(prompt, options)
                    );
                    return response.getResult().getOutput().getContent();
                } catch (Exception e) {
                    return "Error: " + e.getMessage();
                }
            })
            .collect(Collectors.toList());
    }
}

// 数据类
@Data
class UserProfile {
    private String name;
    private Integer age;
    private List<String> skills;
    private Integer experienceYears;
}

@Data
class RAGQuery {
    private String question;
    private Integer topK;  // 返回前K个相关文档
}

七、跨模型统一配置策略

7.1 Portable Options 实战

核心问题：如何在不修改业务代码的情况下，在不同模型间切换？

java 复制代码

@Configuration
public class MultiModelChatConfig {
    
    /**
     * 使用 Portable Options 的通用 ChatModel
     * 所有模型共享这个接口，具体实现由 Spring 自动选择
     */
    @Bean
    public ChatService chatService(ChatModel chatModel) {
        return new ChatService(chatModel);
    }
}

@Service
public class ChatService {
    
    private final ChatModel chatModel;
    
    public ChatService(ChatModel chatModel) {
        this.chatModel = chatModel;
    }
    
    /**
     * 完全模型无关的调用
     * 业务代码只需要关心 Portable Options，无需知道具体是哪个模型
     */
    public String generateContent(String userPrompt, GenerationSettings settings) {
        
        // 使用 Portable Options Builder 构建跨模型兼容的配置
        ChatOptions options = ChatOptions.builder()
            .withTemperature(settings.getTemperature())
            .withTopP(settings.getTopP())
            .withMaxTokens(settings.getMaxTokens())
            .build();
        
        Prompt prompt = new Prompt(userPrompt, options);
        ChatResponse response = chatModel.call(prompt);
        
        return response.getResult().getOutput().getContent();
    }
    
    /**
     * 可选：运行时切换模型（进阶用法）
     * 如果需要模型特定功能，在这层添加特化逻辑
     */
    public String generateWithFallback(
            String userPrompt,
            GenerationSettings settings) {
        
        try {
            // 首先尝试用 gpt-5-mini（成本最优）
            return generateContent(userPrompt, settings);
            
        } catch (Exception e) {
            
            // 如果失败或超时，降级到 Ollama（本地备选）
            if (isNetworkError(e)) {
                settings.setModel("mistral");  // 切换到本地模型
                return generateContent(userPrompt, settings);
            }
            throw e;
        }
    }
    
    private boolean isNetworkError(Exception e) {
        return e instanceof IOException 
            || e instanceof TimeoutException;
    }
}

@Data
class GenerationSettings {
    private String model;              // 可选指定模型
    private Double temperature = 0.7;
    private Double topP = 0.9;
    private Integer maxTokens = 1024;
}

7.2 多模型配置文件切换

使用 Spring Profiles 在不同环境使用不同模型：

yaml 复制代码

# application-prod.yml (生产环境：成本优先)
spring:
  profiles:
    active: openai
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-5-mini
          temperature: 0.5

# application-staging.yml (预发布：质量优先)
spring:
  profiles:
    active: gemini
  ai:
    vertex:
      ai:
        gemini:
          chat:
            options:
              model: gemini-2.0-pro
              thinking-level: MEDIUM

# application-dev.yml (开发：完全私有化)
spring:
  profiles:
    active: ollama
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: mistral

# application-test.yml (测试：快速反馈)
spring:
  profiles:
    active: claude
  ai:
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
      chat:
        options:
          model: claude-3-5-haiku

启动：

bash 复制代码

# 切换环境
java -Dspring.profiles.active=prod -jar app.jar
java -Dspring.profiles.active=dev -jar app.jar

八、配置对比总结表

配置项	OpenAI	Gemini	Anthropic	Ollama
部署模式	SaaS	SaaS/Vertex AI	SaaS	本地
认证	API Key	GCP/API Key	API Key	无
推荐模型	gpt-5-mini	gemini-2.0-flash	claude-3-5-sonnet	mistral
temperature范围	0-2	0-2	0-1	0-2
工具调用	✅ tools	✅ function_calling	✅✅✅ (best)	❌
视觉输入	✅	✅✅ (multimodal)	✅	❌
流式响应	✅	✅	✅	✅
Thinking Mode	❌	✅ (Thinking Level)	❌	❌
长上下文	128K	100K (1.5 Pro)	200K	4K/8K
成本	$0.3/M (输出)	$3-5/M	$3/M	0 (自托管)
成熟度	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐

九、生产环境最佳实践

9.1 错误处理与降级

java 复制代码

@Service
public class ResilientChatService {
    
    private final ChatModel primaryModel;      // OpenAI
    private final ChatModel fallbackModel;     // Claude
    private final ChatModel localModel;        // Ollama
    
    private static final Logger logger = LoggerFactory.getLogger(
        ResilientChatService.class
    );
    
    /**
     * 三级降级策略：
     * Level 1: 优先使用成本最优的 (gpt-5-mini)
     * Level 2: 如果速率限制，降级到 Claude
     * Level 3: 如果网络故障，使用本地 Ollama
     */
    public String call(String prompt) {
        
        try {
            return callWithPrimary(prompt);
        } catch (RateLimitException e) {
            logger.warn("Primary model rate limited, falling back to secondary");
            try {
                return callWithFallback(prompt);
            } catch (Exception e2) {
                logger.warn("Fallback model failed, using local model");
                return callWithLocal(prompt);
            }
        } catch (NetworkException | TimeoutException e) {
            logger.warn("Network error, using local model immediately");
            return callWithLocal(prompt);
        }
    }
    
    private String callWithPrimary(String prompt) {
        // 使用 OpenAI gpt-5-mini
        ChatResponse response = primaryModel.call(new Prompt(prompt));
        return response.getResult().getOutput().getContent();
    }
    
    private String callWithFallback(String prompt) {
        // 使用 Claude，更高的速率限制
        ChatResponse response = fallbackModel.call(new Prompt(prompt));
        return response.getResult().getOutput().getContent();
    }
    
    private String callWithLocal(String prompt) {
        // 使用本地 Ollama，无网络依赖
        ChatResponse response = localModel.call(new Prompt(prompt));
        return response.getResult().getOutput().getContent();
    }
}

9.2 成本监控与预警

java 复制代码

@Component
public class CostMonitor {
    
    private AtomicLong totalTokensUsed = new AtomicLong(0);
    private AtomicLong totalCost = new AtomicLong(0);  // 以美分为单位
    
    /**
     * 记录 API 调用成本
     */
    public void recordCost(String model, int inputTokens, int outputTokens) {
        
        long cost = calculateCost(model, inputTokens, outputTokens);
        totalTokensUsed.addAndGet(inputTokens + outputTokens);
        totalCost.addAndGet(cost);
        
        // 每消耗 $100 发出警告
        if (totalCost.get() % 10000 == 0) {
            logger.warn(
                "Cost reached ${} with {} total tokens used",
                totalCost.get() / 100.0,
                totalTokensUsed.get()
            );
        }
    }
    
    private long calculateCost(String model, int inputTokens, 
                              int outputTokens) {
        
        // 定价（美分/百万tokens）
        Map<String, Long[]> pricing = Map.of(
            "gpt-5-mini", new Long[]{7L, 30L},        // 输入, 输出
            "gpt-4-turbo", new Long[]{1000L, 3000L},
            "claude-3-5-sonnet", new Long[]{300L, 1500L},
            "gemini-2.0-pro", new Long[]{100L, 400L}
        );
        
        Long[] prices = pricing.getOrDefault(model, new Long[]{0L, 0L});
        
        return (inputTokens * prices[0] + outputTokens * prices[1]) 
            / 1_000_000;  // 转换为美分
    }
}

9.3 缓存策略优化

java 复制代码

@Configuration
@EnableCaching
public class CacheConfig {
    
    /**
     * 对于相同输入，缓存 24 小时
     * 这对降低成本特别有效（如 FAQ、文档生成等）
     */
    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager cacheManager = new CaffeineCacheManager();
        cacheManager.setCaffeine(Caffeine.newBuilder()
            .expireAfterWrite(24, TimeUnit.HOURS)
            .maximumSize(10000)
        );
        return cacheManager;
    }
}

@Service
public class CachedChatService {
    
    private final ChatModel chatModel;
    
    @Cacheable(value = "chatResponses", 
               key = "T(java.security.MessageDigest)" +
                     ".getInstance('MD5')" +
                     ".digest(#prompt.getBytes())")
    public String cachedCall(String prompt) {
        return chatModel.call(prompt);
    }
}

十、模型选型决策树

复制代码

项目需求来临
    ↓
[需要本地私有化吗？]
    ├─ YES → Ollama
    │  └─ 特点：零成本、完全控制、延迟低
    │  └─ 场景：内网环境、敏感数据、流量预测
    │
    └─ NO → [需要工具调用吗？]
       ├─ YES → [单次成本预算？]
       │  ├─ [低] → Claude 3.5 Haiku ($0.8/M输出)
       │  ├─ [中] → Claude 3.5 Sonnet ($3/M输出)
       │  └─ [高] → Claude 3 Opus ($60/M输出)
       │
       └─ NO → [需要推理能力？]
          ├─ YES → [推理时间预算？]
          │  ├─ [紧] → Gemini 2.0 Flash (THINKING:LOW)
          │  └─ [充足] → Gemini 1.5 Pro (THINKING:MEDIUM/HIGH)
          │
          └─ NO → OpenAI GPT-5-mini
             └─ 特点：最便宜、最快、生产环境首选

十一、常见问题与调试

Q1: 如何在本地测试不同模型配置？

java 复制代码

@RestController
@RequestMapping("/test")
public class ModelTestController {
    
    private final ApplicationContext applicationContext;
    
    /**
     * 动态加载不同 ChatModel 实现
     */
    @GetMapping("/compare/{modelType}")
    public Map<String, Object> compareModels(
            @PathVariable String modelType,
            @RequestParam String prompt) {
        
        ChatModel model = switch (modelType) {
            case "openai" -> 
                applicationContext.getBean("openaiChatModel", ChatModel.class);
            case "gemini" -> 
                applicationContext.getBean("geminiChatModel", ChatModel.class);
            case "anthropic" -> 
                applicationContext.getBean("anthropicChatModel", 
                                          ChatModel.class);
            case "ollama" -> 
                applicationContext.getBean("ollamaChatModel", ChatModel.class);
            default -> throw new IllegalArgumentException("Unknown model");
        };
        
        long startTime = System.currentTimeMillis();
        String response = model.call(prompt);
        long duration = System.currentTimeMillis() - startTime;
        
        return Map.of(
            "model", modelType,
            "response", response,
            "latency_ms", duration
        );
    }
}

Q2: 参数 temperature 和 top_p 该如何设置？

场景	temperature	top_p	说明
精确问答	0.0 - 0.3	0.8	确定性输出
通用对话	0.5 - 0.7	0.9	推荐
创意生成	0.8 - 1.0	0.95	多样性
代码生成	0.2 - 0.4	0.85	语法正确

Q3: 流式响应如何处理？

java 复制代码

@GetMapping("/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<String>> streamChat(
        @RequestParam String message) {
    
    return chatModel.stream(new Prompt(message))
        .map(response -> ServerSentEvent.builder(String.class)
            .data(response.getResult().getOutput().getContent())
            .build());
}

总结

Spring AI 2.0 通过 Portable Options 的统一适配器模式，让企业能够轻松在多个 LLM 提供商间切换，核心要点如下：

提供商	最佳用途	快速开始
OpenAI	生产环境首选，成本最优	配置 API Key + gpt-5-mini
Gemini	复杂推理，Thinking Mode	启用 Thinking Level.MEDIUM
Anthropic	工具调用，企业应用	Claude 3.5 Sonnet
Ollama	本地私有化，零成本	`ollama pull mistral`

建议的生产策略是："开发用 Ollama 验证逻辑，生产用 OpenAI 控制成本，必要时降级到 Claude 或切回本地模型"。这样既能保证开发效率，又能在生产环境获得最优成本。

掌握这套配置能力，你就真正掌握了 AI 应用的灵活性。