前言
在实际项目中,我们往往需要同时接入多个AI大模型:ChatGPT处理英文场景、Claude做长文本分析、通义千问支持中文客服......本文深入讲解如何在Java中统一封装和调用多源LLM API,实现"一次编写,随意切换"的理想架构。
一、LLM API 调用本质
无论哪家大模型,API调用本质都是HTTP POST + JSON:
http
POST https://api.openai.com/v1/chat/completions
Authorization: Bearer sk-xxxx
Content-Type: application/json
{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "你是一个专业的Java技术博主"},
{"role": "user", "content": "什么是Spring Boot?"}
],
"temperature": 0.7,
"max_tokens": 1000
}
理解了这个本质,我们就能用Java原生HttpClient 或任何HTTP库完成调用,无需依赖任何框架。
二、封装统一的大模型客户端
2.1 定义统一接口
java
public interface LlmClient {
/**
* 同步对话
*/
LlmResponse chat(LlmRequest request);
/**
* 流式对话
*/
Flux<String> streamChat(LlmRequest request);
/**
* 获取模型名称
*/
String getModelName();
}
2.2 OpenAI 实现
java
@Component
@RequiredArgsConstructor
public class OpenAiClient implements LlmClient {
private final RestTemplate restTemplate;
private final ObjectMapper objectMapper;
@Value("${ai.openai.api-key}")
private String apiKey;
@Value("${ai.openai.base-url:https://api.openai.com}")
private String baseUrl;
@Override
public LlmResponse chat(LlmRequest request) {
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_JSON);
headers.setBearerAuth(apiKey);
Map<String, Object> body = buildBody(request);
HttpEntity<Map<String, Object>> entity = new HttpEntity<>(body, headers);
ResponseEntity<String> response = restTemplate.exchange(
baseUrl + "/v1/chat/completions",
HttpMethod.POST,
entity,
String.class
);
return parseResponse(response.getBody());
}
@Override
public Flux<String> streamChat(LlmRequest request) {
// 流式处理核心逻辑
return Flux.create(sink -> {
// WebClient流式请求实现
});
}
private Map<String, Object> buildBody(LlmRequest request) {
Map<String, Object> body = new HashMap<>();
body.put("model", "gpt-4-turbo");
body.put("temperature", request.getTemperature());
body.put("max_tokens", request.getMaxTokens());
List<Map<String, String>> messages = new ArrayList<>();
// system消息
if (request.getSystemPrompt() != null) {
messages.add(Map.of(
"role", "system",
"content", request.getSystemPrompt()
));
}
// 对话历史
request.getHistory().forEach(msg ->
messages.add(Map.of(
"role", msg.getRole(),
"content", msg.getContent()
))
);
// 最新消息
messages.add(Map.of(
"role", "user",
"content", request.getPrompt()
));
body.put("messages", messages);
body.put("stream", true);
return body;
}
}
2.3 通义千问实现
通义千问的API结构和OpenAI高度兼容,但Endpoint不同:
java
@Component
@Primary // 设为默认实现
public class DashScopeClient implements LlmClient {
@Value("${ai.dashscope.api-key}")
private String apiKey;
private static final String BASE_URL =
"https://dashscope.aliyuncs.com/compatible-mode/v1";
@Override
public LlmResponse chat(LlmRequest request) {
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_JSON);
headers.set("Authorization", "Bearer " + apiKey);
Map<String, Object> body = new HashMap<>();
body.put("model", "qwen-turbo"); // 或 qwen-max, qwen-plus
body.put("input", Map.of("messages", buildMessages(request)));
body.put("parameters", Map.of(
"temperature", request.getTemperature(),
"max_tokens", request.getMaxTokens(),
"result_format", "message"
));
HttpEntity<Map<String, Object>> entity = new HttpEntity<>(body, headers);
ResponseEntity<String> response = new RestTemplate().exchange(
BASE_URL + "/services/aigc/text-generation/generation",
HttpMethod.POST,
entity,
String.class
);
return parseDashScopeResponse(response.getBody());
}
private List<Map<String, String>> buildMessages(LlmRequest request) {
List<Map<String, String>> messages = new ArrayList<>();
if (request.getSystemPrompt() != null) {
messages.add(Map.of("role", "system", "text", request.getSystemPrompt()));
}
request.getHistory().forEach(msg ->
messages.add(Map.of("role", msg.getRole(), "text", msg.getContent()))
);
messages.add(Map.of("role", "user", "text", request.getPrompt()));
return messages;
}
}
三、流式响应的处理
流式输出是AI应用体验的关键。SSE(Server-Sent Events)是主流方案:
java
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestParam String prompt) {
LlmRequest request = LlmRequest.builder()
.prompt(prompt)
.temperature(0.7)
.maxTokens(2048)
.build();
return llmClient.streamChat(request)
.map(chunk -> "data: " + chunk + "\n\n")
.concatWith(Flux.just("data: [DONE]\n\n"));
}
前端接收示例(Fetch API):
javascript
const response = await fetch('/chat/stream?prompt=' + encodeURIComponent(prompt));
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// 实时处理:text = 'data: 你好...\n\n'
document.getElementById('output').innerHTML += text;
}
四、Token计算与成本控制
Token是AI API计费的核心单位。以下是Java实现:
java
@Component
public class TokenCounter {
/**
* 估算中文字符的Token数
* 中文:约1.5-2个字符 = 1 Token
* 英文:约4个字符 = 1 Token
*/
public int estimateTokens(String text) {
int chineseChars = (int) text.chars()
.filter(c -> c > 0x4E00 && c < 0x9FA5)
.count();
int otherChars = text.length() - chineseChars;
return (int) Math.ceil(chineseChars * 1.5) + (otherChars / 4);
}
/**
* 对话历史裁剪(防止超出上下文窗口)
*/
public List<ChatMessage> truncateHistory(
List<ChatMessage> history,
int maxTokens) {
int totalTokens = history.stream()
.mapToInt(m -> estimateTokens(m.getContent()))
.sum();
if (totalTokens <= maxTokens) {
return history;
}
// 从最老的开始删,直到在预算内
List<ChatMessage> result = new ArrayList<>(history);
while (totalTokens > maxTokens && !result.isEmpty()) {
ChatMessage removed = result.remove(0);
totalTokens -= estimateTokens(removed.getContent());
}
return result;
}
}
五、多模型路由策略
实际项目中,我们需要根据场景自动选择最合适的模型:
java
@Service
@RequiredArgsConstructor
public class LlmRouter {
private final OpenAiClient openAiClient;
private final DashScopeClient dashScopeClient;
private final ClaudeClient claudeClient;
public LlmClient route(String scenario) {
return switch (scenario) {
case "code_review" -> openAiClient; // GPT代码能力强
case "long_text_summary" -> claudeClient; // Claude长文本
case "chinese_chat" -> dashScopeClient; // 通义中文好
default -> dashScopeClient;
};
}
// 或者按成本自动选择
public LlmClient routeByBudget(double budgetPerRequest) {
if (budgetPerRequest < 0.01) {
return dashScopeClient; // 便宜
} else if (budgetPerRequest < 0.05) {
return openAiClient; // 中等
} else {
return claudeClient; // 贵但好
}
}
}
六、错误处理与重试
java
@Component
public class LlmRetryHandler {
private static final int MAX_RETRIES = 3;
private static final Duration INITIAL_BACKOFF = Duration.ofSeconds(2);
public LlmResponse executeWithRetry(Supplier<LlmResponse> action) {
int attempts = 0;
Exception lastException = null;
while (attempts < MAX_RETRIES) {
try {
return action.get();
} catch (RateLimitException e) {
lastException = e;
attempts++;
if (attempts < MAX_RETRIES) {
sleep(INITIAL_BACKOFF.multipliedBy(attempts));
}
} catch (ApiException e) {
// 4xx错误不重试
if (e.getStatusCode() >= 400 && e.getStatusCode() < 500) {
throw new RuntimeException("API Error: " + e.getMessage(), e);
}
lastException = e;
attempts++;
}
}
throw new RuntimeException("LLM调用失败,已重试" + MAX_RETRIES + "次", lastException);
}
private void sleep(Duration duration) {
try {
Thread.sleep(duration.toMillis());
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
七、总结
本文详细讲解了:
| 内容 | 关键点 |
|---|---|
| 统一接口设计 | LlmClient抽象屏蔽差异 |
| OpenAI接入 | 标准的Chat Completions API |
| 通义千问接入 | 阿里云DashScope API |
| 流式响应 | SSE + 前端Fetch处理 |
| Token计算 | 中英文混合场景的估算 |
| 智能路由 | 按场景/成本自动选模型 |
| 重试机制 | 429/5xx错误的指数退避 |
核心思想:不要被框架绑架。理解HTTP+JSON的本质,用Java标准库也能完成一切------框架只是让这个过程更优雅。