Java 调用 LLM 全解析：ChatGPT、Claude、通义千问一网打尽

前言

在实际项目中，我们往往需要同时接入多个AI大模型：ChatGPT处理英文场景、Claude做长文本分析、通义千问支持中文客服......本文深入讲解如何在Java中统一封装和调用多源LLM API，实现"一次编写，随意切换"的理想架构。

一、LLM API 调用本质

无论哪家大模型，API调用本质都是HTTP POST + JSON：

http 复制代码

POST https://api.openai.com/v1/chat/completions
Authorization: Bearer sk-xxxx
Content-Type: application/json

{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "你是一个专业的Java技术博主"},
    {"role": "user", "content": "什么是Spring Boot?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000
}

理解了这个本质，我们就能用Java原生HttpClient 或任何HTTP库完成调用，无需依赖任何框架。

二、封装统一的大模型客户端

2.1 定义统一接口

java 复制代码

public interface LlmClient {

    /**
     * 同步对话
     */
    LlmResponse chat(LlmRequest request);

    /**
     * 流式对话
     */
    Flux<String> streamChat(LlmRequest request);

    /**
     * 获取模型名称
     */
    String getModelName();
}

2.2 OpenAI 实现

java 复制代码

@Component
@RequiredArgsConstructor
public class OpenAiClient implements LlmClient {

    private final RestTemplate restTemplate;
    private final ObjectMapper objectMapper;

    @Value("${ai.openai.api-key}")
    private String apiKey;

    @Value("${ai.openai.base-url:https://api.openai.com}")
    private String baseUrl;

    @Override
    public LlmResponse chat(LlmRequest request) {
        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);
        headers.setBearerAuth(apiKey);

        Map<String, Object> body = buildBody(request);

        HttpEntity<Map<String, Object>> entity = new HttpEntity<>(body, headers);
        ResponseEntity<String> response = restTemplate.exchange(
            baseUrl + "/v1/chat/completions",
            HttpMethod.POST,
            entity,
            String.class
        );

        return parseResponse(response.getBody());
    }

    @Override
    public Flux<String> streamChat(LlmRequest request) {
        // 流式处理核心逻辑
        return Flux.create(sink -> {
            // WebClient流式请求实现
        });
    }

    private Map<String, Object> buildBody(LlmRequest request) {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "gpt-4-turbo");
        body.put("temperature", request.getTemperature());
        body.put("max_tokens", request.getMaxTokens());

        List<Map<String, String>> messages = new ArrayList<>();
        // system消息
        if (request.getSystemPrompt() != null) {
            messages.add(Map.of(
                "role", "system",
                "content", request.getSystemPrompt()
            ));
        }
        // 对话历史
        request.getHistory().forEach(msg -> 
            messages.add(Map.of(
                "role", msg.getRole(),
                "content", msg.getContent()
            ))
        );
        // 最新消息
        messages.add(Map.of(
            "role", "user",
            "content", request.getPrompt()
        ));

        body.put("messages", messages);
        body.put("stream", true);
        return body;
    }
}

2.3 通义千问实现

通义千问的API结构和OpenAI高度兼容，但Endpoint不同：

java 复制代码

@Component
@Primary  // 设为默认实现
public class DashScopeClient implements LlmClient {

    @Value("${ai.dashscope.api-key}")
    private String apiKey;

    private static final String BASE_URL = 
        "https://dashscope.aliyuncs.com/compatible-mode/v1";

    @Override
    public LlmResponse chat(LlmRequest request) {
        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);
        headers.set("Authorization", "Bearer " + apiKey);

        Map<String, Object> body = new HashMap<>();
        body.put("model", "qwen-turbo");  // 或 qwen-max, qwen-plus
        body.put("input", Map.of("messages", buildMessages(request)));
        body.put("parameters", Map.of(
            "temperature", request.getTemperature(),
            "max_tokens", request.getMaxTokens(),
            "result_format", "message"
        ));

        HttpEntity<Map<String, Object>> entity = new HttpEntity<>(body, headers);
        ResponseEntity<String> response = new RestTemplate().exchange(
            BASE_URL + "/services/aigc/text-generation/generation",
            HttpMethod.POST,
            entity,
            String.class
        );

        return parseDashScopeResponse(response.getBody());
    }

    private List<Map<String, String>> buildMessages(LlmRequest request) {
        List<Map<String, String>> messages = new ArrayList<>();
        if (request.getSystemPrompt() != null) {
            messages.add(Map.of("role", "system", "text", request.getSystemPrompt()));
        }
        request.getHistory().forEach(msg ->
            messages.add(Map.of("role", msg.getRole(), "text", msg.getContent()))
        );
        messages.add(Map.of("role", "user", "text", request.getPrompt()));
        return messages;
    }
}

三、流式响应的处理

流式输出是AI应用体验的关键。SSE（Server-Sent Events）是主流方案：

java 复制代码

@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestParam String prompt) {
    LlmRequest request = LlmRequest.builder()
        .prompt(prompt)
        .temperature(0.7)
        .maxTokens(2048)
        .build();

    return llmClient.streamChat(request)
        .map(chunk -> "data: " + chunk + "\n\n")
        .concatWith(Flux.just("data: [DONE]\n\n"));
}

前端接收示例（Fetch API）：

javascript 复制代码

const response = await fetch('/chat/stream?prompt=' + encodeURIComponent(prompt));
const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    const text = decoder.decode(value);
    // 实时处理：text = 'data: 你好...\n\n'
    document.getElementById('output').innerHTML += text;
}

四、Token计算与成本控制

Token是AI API计费的核心单位。以下是Java实现：

java 复制代码

@Component
public class TokenCounter {

    /**
     * 估算中文字符的Token数
     * 中文：约1.5-2个字符 = 1 Token
     * 英文：约4个字符 = 1 Token
     */
    public int estimateTokens(String text) {
        int chineseChars = (int) text.chars()
            .filter(c -> c > 0x4E00 && c < 0x9FA5)
            .count();
        int otherChars = text.length() - chineseChars;
        return (int) Math.ceil(chineseChars * 1.5) + (otherChars / 4);
    }

    /**
     * 对话历史裁剪（防止超出上下文窗口）
     */
    public List<ChatMessage> truncateHistory(
            List<ChatMessage> history, 
            int maxTokens) {
        
        int totalTokens = history.stream()
            .mapToInt(m -> estimateTokens(m.getContent()))
            .sum();

        if (totalTokens <= maxTokens) {
            return history;
        }

        // 从最老的开始删，直到在预算内
        List<ChatMessage> result = new ArrayList<>(history);
        while (totalTokens > maxTokens && !result.isEmpty()) {
            ChatMessage removed = result.remove(0);
            totalTokens -= estimateTokens(removed.getContent());
        }
        return result;
    }
}

五、多模型路由策略

实际项目中，我们需要根据场景自动选择最合适的模型：

java 复制代码

@Service
@RequiredArgsConstructor
public class LlmRouter {

    private final OpenAiClient openAiClient;
    private final DashScopeClient dashScopeClient;
    private final ClaudeClient claudeClient;

    public LlmClient route(String scenario) {
        return switch (scenario) {
            case "code_review" -> openAiClient;     // GPT代码能力强
            case "long_text_summary" -> claudeClient; // Claude长文本
            case "chinese_chat" -> dashScopeClient;   // 通义中文好
            default -> dashScopeClient;
        };
    }

    // 或者按成本自动选择
    public LlmClient routeByBudget(double budgetPerRequest) {
        if (budgetPerRequest < 0.01) {
            return dashScopeClient; // 便宜
        } else if (budgetPerRequest < 0.05) {
            return openAiClient;    // 中等
        } else {
            return claudeClient;    // 贵但好
        }
    }
}

六、错误处理与重试

java 复制代码

@Component
public class LlmRetryHandler {

    private static final int MAX_RETRIES = 3;
    private static final Duration INITIAL_BACKOFF = Duration.ofSeconds(2);

    public LlmResponse executeWithRetry(Supplier<LlmResponse> action) {
        int attempts = 0;
        Exception lastException = null;

        while (attempts < MAX_RETRIES) {
            try {
                return action.get();
            } catch (RateLimitException e) {
                lastException = e;
                attempts++;
                if (attempts < MAX_RETRIES) {
                    sleep(INITIAL_BACKOFF.multipliedBy(attempts));
                }
            } catch (ApiException e) {
                // 4xx错误不重试
                if (e.getStatusCode() >= 400 && e.getStatusCode() < 500) {
                    throw new RuntimeException("API Error: " + e.getMessage(), e);
                }
                lastException = e;
                attempts++;
            }
        }

        throw new RuntimeException("LLM调用失败，已重试" + MAX_RETRIES + "次", lastException);
    }

    private void sleep(Duration duration) {
        try {
            Thread.sleep(duration.toMillis());
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
}

七、总结

本文详细讲解了：

内容	关键点
统一接口设计	LlmClient抽象屏蔽差异
OpenAI接入	标准的Chat Completions API
通义千问接入	阿里云DashScope API
流式响应	SSE + 前端Fetch处理
Token计算	中英文混合场景的估算
智能路由	按场景/成本自动选模型
重试机制	429/5xx错误的指数退避

核心思想：不要被框架绑架。理解HTTP+JSON的本质，用Java标准库也能完成一切------框架只是让这个过程更优雅。