传统业务集成应用大模型案例

轻量级调用

通过HTTP API调用第三方大模型(如OpenAI API、阿里通义千问),或在本地部署开源模型(如LLaMA系列),封装为RESTful服务供Java系统调用

方案一:调用第三方大模型API(如OpenAI/通义千问)的代码开发步骤

开发模型网关服务(Spring Boot应用),接收Java业务系统请求,转发至第三方API(如OpenAI),处理认证、错误、缓存、监控,返回标准化响应

text 复制代码
用户 → Java业务系统前端 → Java业务系统后端 → 模型网关服务 → 第三方大模型API
                                    ↓
                              (智能客服场景示例)

1. 用户在前端输入:"如何办理退款?"
   ↓
2. 前端调用业务系统接口:POST /api/customer-service/ask-ai
   ↓  
3. CustomerController接收请求,调用CustomerService.getAIResponseForCustomer()
   ↓
4. CustomerService构造ModelRequest,调用模型网关:POST /api/v1/ai/chat
   ↓
5. ModelGatewayController接收请求,调用ModelGatewayService处理
   ↓
6. ModelGatewayService调用OpenAIService.callOpenAI()
   ↓
7. OpenAIService发送HTTP请求到 https://api.openai.com/v1/chat/completions
   ↓
8. OpenAI返回结果,逐层返回到CustomerController
   ↓
9. 前端显示AI回复:"您可以在订单页面点击退款按钮..."

①、SpringBoot项目

xml 复制代码
<dependencies>
    <!-- Web RESTful API-->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <!-- Redis 缓存-->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
    <!-- Resilience4j 熔断/重试-->
    <dependency>
        <groupId>io.github.resilience4j</groupId>
        <artifactId>resilience4j-spring-boot2</artifactId>
        <version>2.1.0</version>
    </dependency>
    <!-- OkHttp HTTP客户端,替代RestTemplate,性能更好-->
    <dependency>
        <groupId>com.squareup.okhttp3</groupId>
        <artifactId>okhttp</artifactId>
        <version>4.12.0</version>
    </dependency>
    <!-- Micrometer指标收集 + Prometheus -->
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
    <!-- Lombok -->
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>
    <!-- 测试 -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

②、配置管理(密钥、第三方API参数)

API密钥(如OpenAI的sk-xxx、通义千问的API_KEY)不硬编码,使用环境变量或配置中心(如Spring Cloud Config、HashiCorp Vault)

yml 复制代码
server:
  port: 8080

# 第三方模型配置(可扩展多提供商)
model:
  providers:
    openai:
      base-url: https://api.openai.com/v1
      api-key: ${OPENAI_API_KEY:default_key_for_dev}  # 优先读环境变量OPENAI_API_KEY
      models:
        chat: gpt-3.5-turbo
        completion: text-davinci-003
    tongyi:
      base-url: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
      api-key: ${TONGYI_API_KEY:default_tongyi_key}
      models:
        chat: qwen-turbo

# Redis缓存配置
spring:
  redis:
    host: localhost
    port: 6379
    password: ${REDIS_PASSWORD:}
    lettuce:
      pool:
        max-active: 8
        max-idle: 8

# Resilience4j熔断/重试配置
resilience4j:
  retry:
    instances:
      openai-api:
        max-attempts: 3
        wait-duration: 1s
        exponential-backoff-multiplier: 2
  circuitbreaker:
    instances:
      openai-api:
        failure-rate-threshold: 50
        minimum-number-of-calls: 10
        sliding-window-size: 20

# 监控配置
management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus
  metrics:
    export:
      prometheus:
        enabled: true

③、请求体和响应模型

java 复制代码
/**
	同一接受Java业务系统的参数
*/
@Data
@NoArgsConstructor
@AllArgConstructor
public class ModelRequest{
	private String provider;//模型提供商:openai/tongyi
	private String modelType;//模型类型:chat/completion
	private List<Message> messages;// Chat格式:[{"role":"user","content":"..."}]
	private String prompt;//Completion格式:直接传prompt
	private Integer maxTokens = 500;
	private Double temperature = 0.7;
	private Boolean stream = false;//是否流式
	
	//其他参数:topP,presencePenalty等
}

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Message{
	private String role;
	private String content;
}
java 复制代码
/**
	同一返回给Java业务系统
*/
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ModelResponse{
	private String id; //请求ID(UUID生成)
	private String provider;//实际调用的提供商
	private String model;//实际调用的模型名
	private String content;//生成文本内容
	private Long created;//时间戳
	private Usage usage;//Token用量
	private String error; //错误信息(成功时为null)
}

//Token用量
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Usage{
	private Integer promptTokens;
	private Integer completionTokens;
	private Integer totalTokens;
	
}
java 复制代码
/**

*/
@Data
private static class OpenAiResponse {
    private String id;
    private String model;
    private Long created;
    private List<Choice> choices;
    private Usage usage; // 注意:字段名与自定义Usage一致,可直接复用

    @Data
    private static class Choice {
        private Message message; // 含role和content
        private String finishReason;
    }
}

④、Java业务系统:端口8090,提供 /api/customer-service/ask-ai接口,被前端调用

yml 复制代码
server:
  port: 8090  # 业务系统端口

# 模型网关地址配置
model:
  gateway:
    url: http://localhost:8080/api/v1/ai/chat
    timeout: 60s
java 复制代码
@RestController
@RequestMapping("/api/customer-service")
@RequiredArgsConstructor
public class CustomerController{

	private final CustomerService customerService;
	
	/**
		前端调用的客服接口-前后端调用交互
	*/
	@PostMapping("/ask-ai")
	public ResponseEntity<ApiResponse> askAI(@RequestBody CustomerRequest customerRequest){
		String aiResponse = customerService.getAIResponseForCustomer(
            customerRequest.getQuestion(), 
            customerRequest.getSessionId()
        );

		ApiResponse response = new ApiResponse();
        response.setCode(200);
        response.setMessage("success");
        response.setData(aiResponse);

		return ResponseEntity.ok(response);
	}
}

// 数据模型
@Data
class CustomerRequest {
    private String question;     // 用户问题
    private String sessionId;   // 会话ID
    private String userId;       // 用户ID
}

@Data
class ApiResponse {
    private int code;
    private String message;
    private Object data;
}
java 复制代码
@Service
@RequiredArgsConstructor
public class CustomerService {
	
	//调用模型网关的Http客户端
	private final RestTemplate restTemplate;

	private final String gatewayUrl = "http://localhost:8080/api/v1/ai/chat"; // 网关地址

	/**
     * 智能客服获取AI回复的业务方法
     */
     public String getAIResponseForCustomer(String customerQuestion, String sessionId){
		try{
			//1.构建请求对象
			ModelRequest request = new ModelRequest();
			request.setProvider("openai");           // 使用OpenAI
            request.setModel("gpt-3.5-turbo");       // 具体模型
            request.setUserMessage(customerQuestion); // 用户问题
            request.setSessionId(sessionId);          // 会话ID(用于上下文)
            request.setMaxTokens(500);                // 最大token数
			
			//2.调用模型网关
			ModelResponse response = restTemplate.postForObject(gatewayUrl, request, ModelResponse.class);
           	
           	// 3. 返回AI回复内容
            return response != null ? response.getContent() : "抱歉,AI服务暂时不可用";
		}catch(Exception e){
			log.error("调用AI网关失败: {}", e.getMessage());
            return "AI服务异常,请稍后重试";
		}
	}
}

模型网关服务:端口8080,提供 /api/v1/ai/chat接口,只被Java业务系统调用

yml 复制代码
server:
  port: 8080  # 网关端口

model:
  providers:
    openai:
      base-url: https://api.openai.com/v1
      api-key: ${OPENAI_API_KEY}
      timeout: 30s
    tongyi:
      base-url: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation
      api-key: ${TONGYI_API_KEY}
      timeout: 30s
java 复制代码
@RestController
@RequestMapping("/api/v1/ai")
@RequiredArgsConstructor
public class ModelGatewayController {
    
    private final ModelGatewayService modelGatewayService;
    
    /**
     * 模型网关的核心接口 - Java业务系统调用这里
     */
    @PostMapping("/chat")
    public ResponseEntity<ModelResponse> chatCompletion(@RequestBody ModelRequest request) {
        ModelResponse response = modelGatewayService.processChatRequest(request);
        return ResponseEntity.ok(response);
    }
    
    /**
     * 健康检查接口
     */
    @GetMapping("/health")
    public ResponseEntity<String> health() {
        return ResponseEntity.ok("Model Gateway is running");
    }
}
java 复制代码
@Service
@RequiredArgsConstructor //自动生成包含 final 字段的构造函数
public class ModelGatewayService {

	private final OpenAIService openAIService;
	private final TongyiService tongyiService;
	private final CacheManager cacheManager;

	public ModelResponse processChatRequest(ModelRequest request) {
        // 1. 参数校验
        validateRequest(request);
        
        // 2. 缓存检查
        String cacheKey = generateCacheKey(request);
        ModelResponse cached = cacheManager.get(cacheKey);
        if (cached != null) {
            return cached;
        }
        
        // 3. 根据provider路由到对应服务
        ModelResponse response;
        switch (request.getProvider().toLowerCase()) {
            case "openai":
                response = openAIService.callOpenAI(request);
                break;
            case "tongyi":
                response = tongyiService.callTongyi(request);
                break;
            default:
                throw new IllegalArgumentException("Unsupported provider: " + request.getProvider());
        }
        
        // 4. 缓存结果
        cacheManager.put(cacheKey, response, Duration.ofHours(1));
        
        return response;
    }
    
    private void validateRequest(ModelRequest request) {
        if (StringUtils.isEmpty(request.getProvider())) {
            throw new IllegalArgumentException("Provider is required");
        }
        if (StringUtils.isEmpty(request.getUserMessage())) {
            throw new IllegalArgumentException("User message is required");
        }
    }
    
    private String generateCacheKey(ModelRequest request) {
        return String.format("ai_chat:%s:%s:%s", 
            request.getProvider(), 
            request.getModel(),
            DigestUtils.md5DigestAsHex(request.getUserMessage().getBytes()));
    }
}

⑤、缓存工具类

java 复制代码
@Component
@RequiredArgsConstructor
public class CacheManager{
	
	private final StringRedisTemplate redisTemplate;

	public void set(String key,Object value,Duration ttl){
		try{
			String json = new ObjectMapper().writeValueAsString(value);
            redisTemplate.opsForValue().set(key, json, ttl);
		}catch(JsonProcessingException e){
			throw new RuntimeException("Cache serialization failed", e);
		}
	}

	public <T> T get(String key, Class<T> clazz){
		String json = redisTemplate.opsForValue().get(key);
        if (json == null) 
        	return null;
        try {
            return new ObjectMapper().readValue(json, clazz);
        } catch (JsonProcessingException e) {
            throw new RuntimeException("Cache deserialization failed", e);
        }
	}
}

⑥、业务系统(如智能客服后台)通过HTTP客户端调用网关

java 复制代码
@Service
public class CustomerService{
	
	private final WebClient webClient;//使用Spring WebClient(响应式)
	
	public String getAiSuggestion(String userQuestion){
		//构建请求体
		ModelRequest request = new ModelRequest();
		request.setProvider("openai");
        request.setModelType("chat");
        request.setMessages(List.of(new Message("user", userQuestion)));
        request.setMaxTokens(300);

		// 调用网关
        ModelResponse response = webClient.post()
                .uri("http://localhost:8080/api/v1/models/generate")
                .bodyValue(request)
                .retrieve()
                .bodyToMono(ModelResponse.class)
                .block(); // 同步调用(实际可用异步)

        return response != null ? response.getContent() : "AI辅助暂不可用";
	}
}

方案二:本地部署开源模型封装服务的代码开发步骤

与方案一相比,方案二的模型网关服务需对接本地部署的推理服务器(如vLLM、TGI),而非第三方API。

①、选择推理服务器:优先用vLLM(高性能,支持OpenAI兼容API),部署命令示例:

bash 复制代码
# 启动vLLM服务(Mistral-7B模型,开放8000端口,OpenAI兼容API)
python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Mistral-7B-Instruct-v0.1 \
  --quantization awq \ # 量化加速(可选)
  --port 8000 \
  --gpu-memory-utilization 0.9

验证服务:访问http://localhost:8000/v1/models,返回支持的模型列表

②、网关服务对接本地模型服务器

yml 复制代码
model:
  providers:
    local-vllm: # 本地模型提供商
      base-url: http://localhost:8000/v1 # vLLM的OpenAI兼容API地址
      api-key: "no-key-needed" # vLLM默认无需API Key
      models:
        chat: mistral-7b-instruct
java 复制代码
private ModelResponse callLocalVllmApi(String url, Headers headers, String body, ProviderConfig config) throws IOException {
    // 本地模型调用(与OpenAI API格式兼容,直接复用callOpenAiApi逻辑)
    return callOpenAiApi(url, headers, body, config); 
}

③、多模型路由与动态配置

java 复制代码
// 根据请求中的model参数路由到不同本地模型
if ("llama2".equals(request.getModel())) {
    config = providerConfigs.get("local-llama2"); // 指向LLaMA-2的vLLM服务
}

深度集成

若需高性能(如实时对话),可将大模型推理引擎(如vLLM、TGI)封装为Java可调用的SDK(通过JNI或gRPC);或通过消息队列(Kafka)实现异步交互(如客服系统接收用户问题→推送至大模型服务→返回结果)

SDK封装

方案一:封装推理引擎为Java SDK(gRPC)------ 实时高性能场景

text 复制代码
+-------------------+       +-----------------------+       +-----------------------+
|   Java业务系统     |       |   Model SDK (Java)     |       | 模型推理服务集群       |
| (实时对话服务)   | gRPC  | (gRPC Stub + 封装)   | gRPC  | (vLLM/TGI + GPU节点)  |
| (Spring Boot)    |◄─────►| (连接池/熔断/监控)   |◄─────►| (K8s StatefulSet部署) |
+-------------------+       +-----------------------+       +-----------------------+
                                                                  ▲
                                                                  │ 服务发现(Consul/Nacos)
                                                          +-------+-------+
                                                          | 负载均衡(K8s Service)|
                                                          +-------------------+

①、集群中部署推理服务(vLLM/TGI)

选择推理引擎:优先用vLLM(PagedAttention优化显存,支持连续批处理和流式输出),或TGI(Hugging Face Text Generation Inference,原生gRPC接口)

  • vLLM部署(暴露grpc接口)
bash 复制代码
# 启动vLLM服务(Mistral-7B模型,gRPC端口50051,量化加速)
python -m vllm.entrypoints.grpc.server \
  --model mistralai/Mistral-7B-Instruct-v0.1 \
  --quantization awq \  # 4bit量化,显存占用~8GB
  --port 50051 \
  --gpu-memory-utilization 0.9 \
  --tensor-parallel-size 2  # 2卡并行(若多GPU)
  • TGI部署(原生gRPC)
bash 复制代码
docker run -p 50051:80 -v /path/to/models:/data ghcr.io/huggingface/text-generation-inference:latest \
  --model-id mistralai/Mistral-7B-Instruct-v0.1 \
  --grpc-port 80  # 容器内gRPC端口映射到主机50051

验证服务:通过gRPC客户端工具(如grpcurl)测试接口

bash 复制代码
grpcurl -plaintext -d '{"inputs":"你好"}' localhost:50051 text_generation.InferenceService/Generate

②、定义gRPC服务接口(以TGI为例)

protobuf 复制代码
syntax = "proto3";
package text_generation;

service InferenceService {
  rpc Generate(GenerateRequest) returns (GenerateResponse) {}  // 同步调用
  rpc GenerateStream(GenerateRequest) returns (stream GenerateStreamResponse) {}  // 流式调用
}

message GenerateRequest {
  string inputs = 1;  // Prompt文本
  Parameters parameters = 2;  // 推理参数(温度、max_tokens等)
  bool stream = 3;  // 是否流式输出
}

message Parameters {
  float temperature = 1;  // 随机性(0~1)
  float top_p = 2;  // 核采样(0~1)
  int32 max_new_tokens = 3;  // 最大生成token数
  repeated string stop_sequences = 4;  // 停止序列
}

message GenerateResponse {
  string generated_text = 1;  // 完整生成文本
  Usage usage = 2;  // Token用量
}

message GenerateStreamResponse {
  string text = 1;  // 流式片段(逐字/逐句)
  bool stop = 2;  // 是否结束
}

message Usage {
  int32 input_tokens = 1;
  int32 output_tokens = 2;
}

③、开发Java SDK(封装gRPC客户端)

Java侧通过gRPC Stub调用推理服务,封装为易用的SDK,屏蔽底层细节

xml 复制代码
<dependencies>
  <!-- gRPC -->
  <dependency>
    <groupId>io.grpc</groupId>
    <artifactId>grpc-netty-shaded</artifactId>
    <version>1.58.0</version>
  </dependency>
  <dependency>
    <groupId>io.grpc</groupId>
    <artifactId>grpc-protobuf</artifactId>
    <version>1.58.0</version>
  </dependency>
  <dependency>
    <groupId>io.grpc</groupId>
    <artifactId>grpc-stub</artifactId>
    <version>1.58.0</version>
  </dependency>
  <!-- 熔断/重试 -->
  <dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-grpc</artifactId>
    <version>2.1.0</version>
  </dependency>
</dependencies>
java 复制代码
import io.grpc.ManagedChannel;
import io.grpc.ManagedChannelBuilder;
import io.grpc.stub.StreamObserver;
import text_generation.*;
import java.util.concurrent.TimeUnit;
import java.util.function.Consumer;

public class LLMSDK {
    private final ManagedChannel channel;
    private final InferenceServiceGrpc.InferenceServiceBlockingStub blockingStub;  // 同步调用
    private final InferenceServiceGrpc.InferenceServiceStub asyncStub;  // 流式调用

    // 初始化连接(支持服务发现动态获取地址)
    public LLMSDK(String target) {
        this.channel = ManagedChannelBuilder.forTarget(target)
                .usePlaintext()  // 生产环境用TLS
                .maxInboundMessageSize(10 * 1024 * 1024)  // 10MB最大消息
                .keepAliveTime(30, TimeUnit.SECONDS)  // 保活机制
                .build();
        this.blockingStub = InferenceServiceGrpc.newBlockingStub(channel);
        this.asyncStub = InferenceServiceGrpc.newStub(channel);
    }

    // 同步调用(非流式)
    public String generateSync(String prompt, Parameters params) {
        GenerateRequest request = GenerateRequest.newBuilder()
                .setInputs(prompt)
                .setParameters(params)
                .setStream(false)
                .build();
        GenerateResponse response = blockingStub.generate(request);
        return response.getGeneratedText();
    }

    // 异步流式调用(实时对话逐字显示)
    public void generateStream(String prompt, Parameters params, Consumer<String> chunkHandler, Runnable onComplete) {
        GenerateRequest request = GenerateRequest.newBuilder()
                .setInputs(prompt)
                .setParameters(params)
                .setStream(true)
                .build();

        asyncStub.generateStream(request, new StreamObserver<GenerateStreamResponse>() {
            @Override
            public void onNext(GenerateStreamResponse value) {
                chunkHandler.accept(value.getText());  // 逐块回调(如逐字)
            }
            @Override
            public void onError(Throwable t) { /* 错误处理 */ }
            @Override
            public void onCompleted() { onComplete.run(); }  // 完成回调
        });
    }

    // 关闭连接
    public void shutdown() throws InterruptedException {
        channel.shutdown().awaitTermination(5, TimeUnit.SECONDS);
    }
}

④、业务集成SDK

流程:用户提问→构建Prompt(含对话历史)→SDK流式调用→WebSocket推送结果给前端

java 复制代码
@Service
@RequiredArgsConstructor
public class RealtimeDialogueService {
    private final LLMSDK llmSdk;  // 注入SDK
    private final WebSocketHandler webSocketHandler;  // WebSocket推送

    // 处理用户实时对话请求
    public void handleUserQuery(String sessionId, String userMessage) {
        // 1. 构建Prompt(含对话历史,简化示例)
        String prompt = "用户:" + userMessage + "\n助手:";

        // 2. 配置推理参数
        Parameters params = Parameters.newBuilder()
                .setTemperature(0.7f)
                .setMaxNewTokens(200)
                .setStream(true)
                .build();

        // 3. 调用SDK流式接口
        llmSdk.generateStream(
            prompt, 
            params,
            chunk -> webSocketHandler.push(sessionId, chunk),  // 逐字推送给前端
            () -> webSocketHandler.push(sessionId, "[DONE]")   // 完成标记
        );
    }
}

方案二:消息队列异步交互(Kafka)

某电商客服系统每日接收10万+工单,需自动分类(物流/售后/咨询)并生成初步回复。要求不阻塞用户提交,允许分钟级延迟,支持后续人工复核

text 复制代码
+-------------------+       +----------------+       +-----------------------+
|  用户/前端系统     |       |  Kafka Topic   |       | 大模型推理服务(消费者)|
| (提交工单)       |------>|  ticket-req    |------>| (vLLM/TGI + 业务逻辑) |
+-------------------+       +----------------+       +-----------------------+
                                                                  ▼
                                                          +-----------------------+
                                                          |  Kafka Topic          |
                                                          |  ticket-resp          |
                                                          +-----------------------+
                                                                  ▼
                                                          +-----------------------+
                                                          |  结果处理器(消费者)   |
                                                          | (更新DB/通知人工)     |
                                                          +-----------------------+

①、设计Kafka消息流

  • ticket-req:工单请求Topic(分区数=推理服务实例数×2,保证并行度)
  • ticket-resp:工单结果Topic(按ticketId哈希分区,确保有序)
  • ticket-dlq:死信Topic(处理重试失败的工单)

消息格式

json 复制代码
// 请求消息(ticket-req)
{
  "type": "record",
  "name": "TicketRequest",
  "fields": [
    {"name": "ticketId", "type": "string"},  // 工单ID(UUID)
    {"name": "content", "type": "string"},  // 工单内容
    {"name": "timestamp", "type": "long"},  // 提交时间
    {"name": "priority", "type": ["null", "string"], "default": "medium"}  // 优先级
  ]
}

// 响应消息(ticket-resp)
{
  "type": "record",
  "name": "TicketResponse",
  "fields": [
    {"name": "ticketId", "type": "string"},
    {"name": "category", "type": "string"},  // 分类结果(物流/售后)
    {"name": "reply", "type": "string"},  // 生成回复
    {"name": "status", "type": "string"},  // SUCCESS/FAILED
    {"name": "errorMsg", "type": ["null", "string"], "default": null}
  ]
}

②、Java业务系统(生产者):推送工单请求

java 复制代码
@Service
@RequiredArgsConstructor
public class TicketProducer{
	private final KafkaTemplate<String,TicketRequest> kafkaTemplate;
	private final ObjectMapper objectMapper;

	//推送工单请求到kafka
	public void sendTicketRequest(TicketRequest request){
		//分区键:ticketId哈希(确保同一工单有序)
		String key = String.valueOf(request.getTicketId().hashCode());
		//发送消息
		kafkaTemplate.send("ticket-req",key,request)
			.addCallback(
				result -> log.info("工单{}发送成功", request.getTicketId()),
				ex -> log.error("工单{}发送失败", request.getTicketId(), ex)
			);
	}
}

③、大模型推理服务(消费者)处理工单并生成结果

部署:Kubernetes Deployment,每个Pod包含一个Kafka Consumer Group成员,消费ticket-reqTopic

java 复制代码
@Service
@RequiredArgsConstructor
public class TicketConsumer{
	private final KafkaTemplate<String,TicketResponse> respTemplate;
	private final LLMClient llmClient;

	@KafkaListener(topics = "ticket-req",groupId = "model-worker-group")
	public void consumer(ConsumerRecord<String, TicketRequest> record, Acknowledgment ack){
		try{
			TicketRequest req = record.value();
            // 1. 构建Prompt(分类+生成回复)
            String prompt = String.format("分类工单内容:%s\n可选类别:物流、售后、咨询\n生成回复:", req.getContent());
            // 2. 调用模型(同步调用,非流式)
            String result = llmClient.generateSync(prompt, Parameters.getDefaultInstance());
            // 3. 解析结果(简化示例:假设模型返回"分类:物流;回复:您的快递预计...")
            TicketResponse resp = parseResult(req.getTicketId(), result);
            // 4. 发送到结果Topic
            respTemplate.send("ticket-resp", req.getTicketId(), resp);
            // 5. 手动提交偏移量(确保消息不丢失)
            ack.acknowledge();
		}catch(Exception e){
			log.error("工单{}处理失败", record.key(), e);
            // 重试3次后进入死信队列(通过Spring Retry实现)
		}
	}
}

④、结果处理(消费者):更新状态与通知

java 复制代码
@Service
@RequiredArgsConstructor
public class ResultProcessor {
    private final TicketRepository ticketRepo;  // 数据库操作
    private final NotificationService notificationService;  // 通知人工客服

    @KafkaListener(topics = "ticket-resp", groupId = "result-processor-group")
    public void processResult(ConsumerRecord<String, TicketResponse> record) {
        TicketResponse resp = record.value();
        // 1. 更新工单状态
        ticketRepo.updateStatus(resp.getTicketId(), resp.getStatus(), resp.getCategory(), resp.getReply());
        // 2. 通知人工客服(若需复核)
        if ("SUCCESS".equals(resp.getStatus())) {
            notificationService.notifyAgent(resp.getTicketId());
        }
    }
}