Spring AI Alibaba 企业级实战:从0到1构建智能客服系统
导读:本文是 Spring AI Alibaba 系列的压轴篇。我们将用一个真实落地的智能客服系统,把前面十四篇的所有技术点串联起来------RAG 知识库、多轮对话、意图识别与槽位填充、Agent 工具调用、WebSocket 长连接、Resilience4j 熔断降级、Micrometer 监控大屏,一个都不少。这不是 Demo,是可以直接参考的生产级架构。
一、系统整体架构设计
在动手写代码之前,先把架构想清楚。智能客服系统看似简单,实则是多个复杂子系统的协同。我见过太多项目把意图识别、RAG 检索、Agent 调用全堆在一个 Service 里,结果一出问题排查半天。分层 + 解耦是第一原则。
1.1 四层架构全景
╔══════════════════════════════════════════════════════════════════╗
║ 网关层 (API Gateway) ║
║ Spring Cloud Gateway | OAuth2 Token | 限流 | 灰度路由 ║
╚══════════════════════════════════════════════════════════════════╝
↓ WebSocket / REST ↓ REST
╔══════════════════════════╗ ╔══════════════════════════════════╗
║ 业务层 (Business) ║ ║ AI 层 (AI Service) ║
║ ║ ║ ║
║ ┌──────────────────┐ ║ ║ ┌─────────────────────────┐ ║
║ │ 会话管理服务 │ ║ ║ │ 意图识别 + 槽位填充 │ ║
║ │ (WebSocket长连接) │ ║ ║ └─────────────────────────┘ ║
║ └──────────────────┘ ║ ║ ┌─────────────────────────┐ ║
║ ┌──────────────────┐ ║ ║ │ RAG 知识库问答 │ ║
║ │ 工单管理服务 │◄───╫────╫──│ (QuestionAnswerAdvisor) │ ║
║ └──────────────────┘ ║ ║ └─────────────────────────┘ ║
║ ┌──────────────────┐ ║ ║ ┌─────────────────────────┐ ║
║ │ 人工坐席调度服务 │ ║ ║ │ Agent 工具调用 │ ║
║ └──────────────────┘ ║ ║ │ (ReactAgent) │ ║
║ ┌──────────────────┐ ║ ║ └─────────────────────────┘ ║
║ │ 通知推送服务 │ ║ ║ ┌─────────────────────────┐ ║
║ └──────────────────┘ ║ ║ │ 多模型路由 + 熔断降级 │ ║
╚══════════════════════════╝ ║ └─────────────────────────┘ ║
╚══════════════════════════════════╝
↓ ↓
╔══════════════════════════════════════════════════════════════════╗
║ 数据层 (Data Layer) ║
║ MySQL(工单/对话记录) Redis(会话/限流) Milvus(向量索引) ║
║ Elasticsearch(全文检索) MinIO(文件存储) Kafka(消息队列) ║
╚══════════════════════════════════════════════════════════════════╝
1.2 微服务拆分策略
按照"一个微服务只做一件事"的原则,我们拆分出以下服务:
| 服务名称 | 核心职责 | 技术栈 |
|---|---|---|
gateway-service |
统一入口、鉴权、限流 | Spring Cloud Gateway + OAuth2 |
session-service |
WebSocket 会话管理、消息路由 | Spring WebSocket + Redis |
ai-service |
意图识别、RAG问答、Agent调用 | Spring AI Alibaba 1.1 |
knowledge-service |
知识库管理、文档入库、向量检索 | Spring AI + Milvus |
ticket-service |
工单 CRUD、状态流转、人工坐席 | Spring Boot + MySQL |
analytics-service |
对话审计、满意度分析、监控大屏 | Micrometer + Grafana |
1.3 项目结构
customer-service-platform/
├── gateway-service/
│ └── src/main/java/.../gateway/
│ ├── config/GatewayConfig.java
│ └── filter/AuthFilter.java
├── session-service/
│ └── src/main/java/.../session/
│ ├── websocket/ChatWebSocketHandler.java
│ └── service/SessionManager.java
├── ai-service/ ← 核心模块
│ └── src/main/java/.../ai/
│ ├── intent/IntentRecognitionService.java
│ ├── rag/CustomerServiceRAG.java
│ ├── agent/CustomerServiceAgent.java
│ └── fallback/HumanEscalationService.java
├── knowledge-service/
├── ticket-service/
├── analytics-service/
└── common/ ← 公共模块
├── dto/
└── event/
二、Maven 依赖与基础配置
2.1 父 POM 依赖管理
xml
<!-- customer-service-platform/pom.xml -->
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.5</version>
</parent>
<properties>
<java.version>17</java.version>
<spring-cloud.version>2023.0.3</spring-cloud.version>
<spring-ai-alibaba.version>1.1.2.2</spring-ai-alibaba.version>
</properties>
<dependencyManagement>
<dependencies>
<!-- Spring AI Alibaba BOM -->
<dependency>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>spring-ai-alibaba-bom</artifactId>
<version>${spring-ai-alibaba.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- Spring Cloud BOM -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
2.2 AI 服务核心依赖
xml
<!-- ai-service/pom.xml -->
<dependencies>
<!-- Spring AI Alibaba 核心 -->
<dependency>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>spring-ai-alibaba-starter</artifactId>
</dependency>
<!-- RAG 向量存储 - Milvus -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-milvus-store-spring-boot-starter</artifactId>
</dependency>
<!-- PDF 文档解析 -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
<!-- Redis 会话存储 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<!-- Resilience4j 熔断 -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
<version>2.2.0</version>
</dependency>
<!-- Micrometer 监控 -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- WebFlux(流式响应) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
</dependencies>
2.3 核心配置文件
yaml
# ai-service/src/main/resources/application.yml
spring:
application:
name: ai-service
ai:
dashscope:
api-key: ${DASHSCOPE_API_KEY}
chat:
options:
model: qwen-max
temperature: 0.3 # 客服场景:更保守的回复
max-tokens: 2048
embedding:
options:
model: text-embedding-v3
data:
redis:
host: ${REDIS_HOST:localhost}
port: 6379
password: ${REDIS_PASSWORD:}
lettuce:
pool:
max-active: 20
max-idle: 10
# Milvus 向量数据库
milvus:
host: ${MILVUS_HOST:localhost}
port: 19530
collection-name: customer_knowledge
dimension: 1536
index-type: HNSW
metric-type: COSINE
# Resilience4j 熔断配置
resilience4j:
circuitbreaker:
instances:
aiService:
sliding-window-size: 10
failure-rate-threshold: 50
wait-duration-in-open-state: 30s
permitted-number-of-calls-in-half-open-state: 3
timelimiter:
instances:
aiService:
timeout-duration: 30s
# 监控端点
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus
metrics:
tags:
application: ${spring.application.name}
三、知识库构建:从文档到可检索向量
知识库是智能客服的灵魂。如果知识库质量差,再好的模型也给不出准确的回答。
3.1 知识库整体入库流程
原始文档(PDF/Word/HTML)
│
▼
┌───────────────────┐
│ 文档加载器 │ TikaDocumentReader / PagePdfDocumentReader
│ (Document Load) │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ 文档清洗 │ 去除页眉页脚、特殊字符、重复段落
│ (Cleaning) │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ 文本切分 │ TokenTextSplitter(512 tokens,重叠 64)
│ (Chunking) │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ 元数据增强 │ 来源、分类、版本号、更新时间
│ (Enrichment) │
└────────┬──────────┘
│
▼
┌───────────────────┐
│ 向量化 + 入库 │ text-embedding-v3 → Milvus
│ (Embedding) │
└───────────────────┘
3.2 知识库服务实现
java
@Service
@Slf4j
public class KnowledgeIngestionService {
private final VectorStore vectorStore;
private final TokenTextSplitter textSplitter;
private final MeterRegistry meterRegistry;
public KnowledgeIngestionService(VectorStore vectorStore,
MeterRegistry meterRegistry) {
this.vectorStore = vectorStore;
this.meterRegistry = meterRegistry;
// 512 tokens,重叠 64 tokens,保证语义完整性
this.textSplitter = new TokenTextSplitter(512, 64, 5, 10000, true);
}
/**
* FAQ 批量导入:从 Excel 读取常见问题对
*/
public void importFAQ(List<FAQItem> faqList) {
List<Document> documents = faqList.stream()
.map(faq -> {
// 将 Q&A 对合并为一个文档,增强检索效果
String content = String.format(
"问题:%s\n答案:%s\n相关标签:%s",
faq.getQuestion(),
faq.getAnswer(),
String.join("、", faq.getTags())
);
return new Document(content, Map.of(
"type", "faq",
"category", faq.getCategory(),
"faq_id", faq.getId(),
"version", "1.0",
"source", "faq_import"
));
})
.collect(Collectors.toList());
ingestDocuments(documents, "FAQ导入");
}
/**
* 历史工单学习:将高质量历史对话入库
* 选取标准:用户满意度 >= 4星 且 AI处理成功
*/
public void learnFromHistoryTickets(List<Ticket> tickets) {
List<Document> documents = tickets.stream()
.filter(t -> t.getSatisfactionScore() >= 4 && t.isAiResolved())
.map(ticket -> {
String content = buildTicketContent(ticket);
return new Document(content, Map.of(
"type", "history_ticket",
"ticket_id", ticket.getId(),
"category", ticket.getCategory(),
"created_at", ticket.getCreatedAt().toString(),
"score", ticket.getSatisfactionScore().toString()
));
})
.collect(Collectors.toList());
log.info("从历史工单学习:共 {} 条高质量对话", documents.size());
ingestDocuments(documents, "历史工单学习");
}
/**
* 产品手册导入:PDF 文档处理
*/
public void importProductManual(Resource pdfResource, String productName) {
// 按页读取 PDF,保留页码信息
PagePdfDocumentReader reader = new PagePdfDocumentReader(
pdfResource,
PdfDocumentReaderConfig.builder()
.withPageExtractedTextFormatter(
ExtractedTextFormatter.builder()
.withNumberOfTopPagesToSkipBeforeDelete(1) // 跳过封面
.withNumberOfBottomTextLinesToDelete(2) // 去除页脚
.build()
)
.withPagesPerDocument(1)
.build()
);
List<Document> rawDocs = reader.get();
List<Document> chunks = textSplitter.apply(rawDocs);
// 附加产品元数据
chunks.forEach(doc -> {
doc.getMetadata().put("type", "product_manual");
doc.getMetadata().put("product", productName);
doc.getMetadata().put("source", pdfResource.getFilename());
});
ingestDocuments(chunks, "产品手册-" + productName);
}
private void ingestDocuments(List<Document> documents, String sourceName) {
// 去重检查(基于内容哈希)
List<Document> newDocs = deduplicateDocuments(documents);
if (newDocs.isEmpty()) {
log.info("[{}] 无新文档需要入库(全部重复)", sourceName);
return;
}
long start = System.currentTimeMillis();
vectorStore.add(newDocs);
long cost = System.currentTimeMillis() - start;
// 记录指标
meterRegistry.counter("knowledge.ingestion.count",
"source", sourceName).increment(newDocs.size());
log.info("[{}] 成功入库 {} 条文档,耗时 {}ms", sourceName, newDocs.size(), cost);
}
private List<Document> deduplicateDocuments(List<Document> documents) {
// 基于内容 MD5 过滤已入库文档(实际项目用 Redis Set 记录已入库哈希)
return documents.stream()
.filter(doc -> !isAlreadyIngested(
DigestUtils.md5DigestAsHex(doc.getText().getBytes())
))
.collect(Collectors.toList());
}
private String buildTicketContent(Ticket ticket) {
StringBuilder sb = new StringBuilder();
sb.append("用户问题:").append(ticket.getUserQuery()).append("\n");
sb.append("问题分类:").append(ticket.getCategory()).append("\n");
sb.append("解决方案:").append(ticket.getResolution()).append("\n");
if (!ticket.getUsedTools().isEmpty()) {
sb.append("涉及操作:").append(String.join("、", ticket.getUsedTools())).append("\n");
}
return sb.toString();
}
private boolean isAlreadyIngested(String hash) {
// Redis SISMEMBER 检查
return false; // 简化示例
}
}
3.3 多轮对话树设计
对于复杂问题(如退款流程、账户解封),需要引导用户逐步提供信息:
java
/**
* 多轮对话树节点
* 用于引导用户完成复杂业务流程(退款、申诉等)
*/
@Data
public class DialogNode {
private String nodeId;
private String question; // 当前节点要问用户的问题
private NodeType type; // QUESTION / ACTION / END
private List<Branch> branches; // 根据用户回答的分支
private String toolToCall; // 到达此节点时要执行的工具
@Data
public static class Branch {
private String condition; // 匹配条件(关键词/正则)
private String nextNodeId; // 跳转的下一个节点
private String slotName; // 提取到的槽位名称
private String slotValue; // 提取到的槽位值
}
}
// 退款流程对话树配置(YAML 格式,运行时加载)
/*
refund_dialog_tree:
- nodeId: "start"
question: "您好,请问您需要申请退款吗?请提供您的订单号"
type: QUESTION
branches:
- condition: "\\d{10,20}" # 匹配订单号格式
nextNodeId: "verify_order"
slotName: "order_id"
- nodeId: "verify_order"
type: ACTION
toolToCall: "queryOrderTool"
branches:
- condition: "found"
nextNodeId: "ask_reason"
- condition: "not_found"
nextNodeId: "order_not_found"
- nodeId: "ask_reason"
question: "请选择退款原因:1.商品质量问题 2.描述不符 3.不想要了 4.其他"
type: QUESTION
branches:
- condition: "1|质量"
nextNodeId: "submit_refund"
slotName: "refund_reason"
slotValue: "quality_issue"
- condition: "2|描述"
nextNodeId: "submit_refund"
slotName: "refund_reason"
slotValue: "description_mismatch"
*/
四、意图识别与槽位填充
4.1 意图识别服务
意图识别是整个系统的"路由器"。用户的一句话决定了走 RAG 问答、工具调用,还是转人工。
java
@Service
@Slf4j
public class IntentRecognitionService {
private final ChatClient chatClient;
private final MeterRegistry meterRegistry;
// 支持的意图列表(实际项目从数据库加载,支持热更新)
private static final String INTENT_RECOGNITION_PROMPT = """
你是一个专业的意图识别引擎。请分析用户输入,识别其意图和槽位信息。
支持的意图类别:
- ORDER_QUERY:查询订单状态、物流信息
- REFUND_REQUEST:申请退款、退货
- PRODUCT_INQUIRY:产品咨询、功能介绍
- ACCOUNT_ISSUE:账户问题、密码重置、账号解封
- COMPLAINT:投诉建议
- HUMAN_TRANSFER:明确要求转人工
- GENERAL_QA:一般性问题咨询
- UNKNOWN:无法识别的意图
请以 JSON 格式输出,包含以下字段:
- intent: 意图名称
- confidence: 置信度(0-1)
- slots: 提取到的槽位(键值对)
- need_more_info: 是否需要追问(true/false)
- missing_slots: 缺失的必要槽位列表
用户输入:{userInput}
""";
public IntentResult recognize(String userInput, String sessionId) {
long start = System.currentTimeMillis();
try {
// 使用结构化输出解析意图
String response = chatClient.prompt()
.user(u -> u.text(INTENT_RECOGNITION_PROMPT)
.param("userInput", userInput))
.options(ChatOptionsBuilder.builder()
.withModel("qwen-turbo") // 意图识别用轻量模型,降低成本
.withTemperature(0.0) // 确定性输出
.withMaxTokens(512)
.build())
.call()
.content();
IntentResult result = parseIntentResult(response, userInput);
// 记录指标
meterRegistry.counter("intent.recognition.count",
"intent", result.getIntent(),
"confidence_level", result.getConfidenceLevel())
.increment();
log.info("[Session:{}] 意图识别结果: intent={}, confidence={}, slots={}",
sessionId, result.getIntent(), result.getConfidence(), result.getSlots());
return result;
} catch (Exception e) {
log.error("[Session:{}] 意图识别失败: {}", sessionId, e.getMessage());
// 失败时返回通用问答意图,保证服务可用性
return IntentResult.fallback(userInput);
} finally {
long cost = System.currentTimeMillis() - start;
meterRegistry.timer("intent.recognition.latency").record(cost, TimeUnit.MILLISECONDS);
}
}
/**
* 槽位填充:从对话上下文中补全缺失槽位
*/
public SlotFillingResult fillSlots(String intent, Map<String, String> existingSlots,
String userInput) {
// 定义每种意图的必要槽位
Map<String, List<String>> requiredSlots = Map.of(
"ORDER_QUERY", List.of("order_id"),
"REFUND_REQUEST", List.of("order_id", "refund_reason"),
"ACCOUNT_ISSUE", List.of("account_id", "issue_type")
);
List<String> required = requiredSlots.getOrDefault(intent, List.of());
List<String> missing = required.stream()
.filter(slot -> !existingSlots.containsKey(slot))
.collect(Collectors.toList());
if (missing.isEmpty()) {
return SlotFillingResult.complete(existingSlots);
}
// 生成追问话术
String nextQuestion = generateFollowUpQuestion(intent, missing.get(0));
return SlotFillingResult.incomplete(nextQuestion, missing.get(0));
}
private String generateFollowUpQuestion(String intent, String missingSlot) {
Map<String, String> questionTemplates = Map.of(
"order_id", "请提供您的订单号(通常是15-20位数字)",
"refund_reason", "请问您的退款原因是什么?",
"account_id", "请提供您的账号或注册手机号",
"issue_type", "请描述具体遇到的账号问题"
);
return questionTemplates.getOrDefault(missingSlot, "请提供更多信息");
}
private IntentResult parseIntentResult(String response, String userInput) {
try {
// 实际项目使用 Jackson 解析 JSON
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(extractJson(response));
return IntentResult.builder()
.intent(node.get("intent").asText())
.confidence(node.get("confidence").asDouble())
.slots(parseSlots(node.get("slots")))
.needMoreInfo(node.get("need_more_info").asBoolean())
.missingSlots(parseList(node.get("missing_slots")))
.build();
} catch (Exception e) {
return IntentResult.fallback(userInput);
}
}
}
4.2 智能路由分发器
java
@Service
@Slf4j
public class RequestRouter {
private final IntentRecognitionService intentService;
private final CustomerServiceRAG ragService;
private final CustomerServiceAgent agentService;
private final HumanEscalationService escalationService;
/**
* 核心路由逻辑
* 根据意图决定走哪条处理链路
*/
public Flux<String> route(String userInput, String sessionId,
ConversationContext context) {
IntentResult intent = intentService.recognize(userInput, sessionId);
// 1. 槽位填充检查
if (intent.isNeedMoreInfo()) {
SlotFillingResult slotResult = intentService.fillSlots(
intent.getIntent(), context.getSlots(), userInput);
if (!slotResult.isComplete()) {
context.setPendingSlot(slotResult.getMissingSlot());
return Flux.just(slotResult.getFollowUpQuestion());
}
}
return switch (intent.getIntent()) {
// 查订单、退款------需要调用工具,走 Agent
case "ORDER_QUERY", "REFUND_REQUEST", "ACCOUNT_ISSUE" ->
agentService.processWithTools(userInput, sessionId,
intent.getSlots(), context);
// 产品咨询------走 RAG 知识库
case "PRODUCT_INQUIRY", "GENERAL_QA" ->
ragService.queryWithContext(userInput, sessionId, context);
// 投诉------先 AI 安抚,记录工单,再评估是否转人工
case "COMPLAINT" ->
handleComplaint(userInput, sessionId, intent, context);
// 明确要转人工
case "HUMAN_TRANSFER" ->
escalationService.transferToHuman(sessionId, context);
// 低置信度------走 RAG 兜底
default -> {
if (intent.getConfidence() < 0.5) {
yield ragService.queryWithContext(userInput, sessionId, context);
}
yield ragService.queryWithContext(userInput, sessionId, context);
}
};
}
private Flux<String> handleComplaint(String userInput, String sessionId,
IntentResult intent, ConversationContext context) {
// 先发安抚话术(立即响应),再异步处理
return Flux.concat(
Flux.just("非常抱歉给您带来不便,我已记录您的反馈,"),
ragService.queryWithContext(
"针对以下投诉给出安抚和解决方案:" + userInput, sessionId, context),
Flux.defer(() -> {
// 异步创建投诉工单
createComplaintTicket(sessionId, userInput, intent);
return Flux.empty();
})
);
}
}
五、RAG 知识库问答
5.1 增强版 RAG 服务
java
@Service
@Slf4j
public class CustomerServiceRAG {
private final ChatClient chatClient;
private final VectorStore vectorStore;
private final ChatMemory chatMemory;
private static final String SYSTEM_PROMPT = """
你是一位专业、耐心的客服助手。请遵循以下原则:
1. 优先使用知识库中的信息回答,答案要准确具体
2. 如果知识库没有相关信息,诚实告知用户并建议转人工
3. 语气友好,避免使用生硬的技术术语
4. 对于涉及账户安全的操作,务必提醒用户注意保护隐私
5. 回复控制在300字以内,重要步骤用序号列出
""";
public CustomerServiceRAG(ChatClient.Builder builder, VectorStore vectorStore) {
this.chatMemory = MessageWindowChatMemory.builder()
.chatMemoryRepository(new RedisChatMemoryRepository(redisTemplate))
.maxMessages(15) // 保留最近15条消息
.build();
this.chatClient = builder
.defaultSystem(SYSTEM_PROMPT)
.defaultAdvisors(
// RAG 检索增强
QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder()
.topK(5)
.similarityThreshold(0.72) // 相似度阈值,低于此不引用
.build())
.build(),
// 多轮对话记忆
MessageChatMemoryAdvisor.builder(chatMemory).build(),
// 请求响应日志(仅 DEBUG 环境)
SimpleLoggerAdvisor.builder().build()
)
.build();
}
/**
* 流式 RAG 问答
*/
public Flux<String> queryWithContext(String userInput, String sessionId,
ConversationContext context) {
return chatClient.prompt()
.user(userInput)
.advisors(a -> a.param(CHAT_MEMORY_CONVERSATION_ID_KEY, sessionId))
.stream()
.content()
.doOnComplete(() ->
log.debug("[RAG] 会话 {} 回答完成", sessionId))
.onErrorResume(e -> {
log.error("[RAG] 会话 {} 出错: {}", sessionId, e.getMessage());
return Flux.just("抱歉,系统暂时遇到了一些问题,请稍后重试,或联系人工客服。");
});
}
/**
* 带元数据过滤的精准检索
* 例如:只检索某个产品线的知识
*/
public Flux<String> queryWithFilter(String userInput, String sessionId,
String productCategory) {
FilterExpressionBuilder filter = new FilterExpressionBuilder();
Filter.Expression categoryFilter = filter.eq("category", productCategory).build();
return chatClient.prompt()
.user(userInput)
.advisors(
QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(SearchRequest.builder()
.topK(5)
.similarityThreshold(0.70)
.filterExpression(categoryFilter)
.build())
.build(),
MessageChatMemoryAdvisor.builder(chatMemory)
.conversationId(sessionId)
.build()
)
.stream()
.content();
}
}
六、Agent 工具调用
6.1 客服工具集定义
java
/**
* 客服工具集
* 使用 Spring AI Alibaba 1.1 的 @Tool 注解
*/
@Component
@Slf4j
public class CustomerServiceTools {
private final OrderService orderService;
private final AccountService accountService;
private final TicketService ticketService;
@Tool(description = "查询订单信息,包括订单状态、物流信息、商品详情。需要提供订单号。")
public OrderInfo queryOrder(@ToolParam(description = "订单号,15-20位数字") String orderId) {
log.info("[Tool] 查询订单: {}", orderId);
try {
return orderService.getOrderById(orderId);
} catch (OrderNotFoundException e) {
return OrderInfo.notFound(orderId);
}
}
@Tool(description = "提交退款申请。需要订单号和退款原因。")
public RefundResult submitRefund(
@ToolParam(description = "订单号") String orderId,
@ToolParam(description = "退款原因:quality_issue/description_mismatch/dont_want/other")
String reason) {
log.info("[Tool] 提交退款申请: orderId={}, reason={}", orderId, reason);
// 参数校验
if (!isValidOrderId(orderId)) {
return RefundResult.error("订单号格式不正确");
}
return orderService.submitRefund(orderId, reason);
}
@Tool(description = "查询用户账户状态,包括是否被封禁、封禁原因、解封方式。")
public AccountStatus queryAccountStatus(
@ToolParam(description = "账号或注册手机号") String accountId) {
log.info("[Tool] 查询账户状态: {}", desensitize(accountId));
return accountService.getAccountStatus(accountId);
}
@Tool(description = "重置用户密码,发送重置链接到注册邮箱或手机。")
public ResetResult resetPassword(
@ToolParam(description = "账号或注册手机号") String accountId) {
log.info("[Tool] 重置密码: {}", desensitize(accountId));
return accountService.sendPasswordResetLink(accountId);
}
@Tool(description = "创建客服工单,记录用户问题。适用于复杂问题需要后续跟进的场景。")
public TicketResult createTicket(
@ToolParam(description = "问题描述") String description,
@ToolParam(description = "问题类别:order/account/product/complaint/other")
String category,
@ToolParam(description = "优先级:low/medium/high/urgent")
String priority) {
log.info("[Tool] 创建工单: category={}, priority={}", category, priority);
return ticketService.create(description, category, priority);
}
@Tool(description = "查询物流轨迹,需要提供快递单号或订单号。")
public LogisticsInfo queryLogistics(
@ToolParam(description = "快递单号或订单号") String trackingId) {
log.info("[Tool] 查询物流: {}", trackingId);
return orderService.getLogistics(trackingId);
}
private boolean isValidOrderId(String orderId) {
return orderId != null && orderId.matches("\\d{10,20}");
}
private String desensitize(String input) {
if (input == null || input.length() < 4) return "***";
return input.substring(0, 3) + "***" + input.substring(input.length() - 2);
}
}
6.2 客服 Agent 主服务
java
@Service
@Slf4j
public class CustomerServiceAgent {
private final ChatClient agentClient;
private static final String AGENT_SYSTEM_PROMPT = """
你是一位专业的智能客服,拥有查询订单、处理退款、解决账户问题等能力。
工作原则:
1. 优先使用工具获取实时数据,不要凭空猜测用户的订单状态
2. 调用工具前,确认已获取到必要的槽位信息(订单号、账号等)
3. 工具调用失败时,给出友好提示并说明可能的原因
4. 涉及退款等敏感操作,执行前再次向用户确认
5. 每步操作后,清晰告知用户结果和下一步
""";
public CustomerServiceAgent(ChatClient.Builder builder,
CustomerServiceTools tools,
ChatMemory chatMemory) {
this.agentClient = builder
.defaultSystem(AGENT_SYSTEM_PROMPT)
.defaultTools(tools) // 注册工具集
.defaultAdvisors(
MessageChatMemoryAdvisor.builder(chatMemory).build()
)
.build();
}
/**
* Agent 流式处理(带工具调用)
*/
public Flux<String> processWithTools(String userInput, String sessionId,
Map<String, String> slots,
ConversationContext context) {
// 将已收集的槽位信息拼接到用户输入中,方便工具调用
String enrichedInput = buildEnrichedInput(userInput, slots);
return agentClient.prompt()
.user(enrichedInput)
.advisors(a -> a.param(CHAT_MEMORY_CONVERSATION_ID_KEY, sessionId))
.stream()
.content()
.timeout(Duration.ofSeconds(30))
.onErrorResume(TimeoutException.class, e -> {
log.warn("[Agent] 会话 {} 工具调用超时", sessionId);
return Flux.just("处理时间较长,已为您创建工单,客服人员将在1小时内跟进,工单号:"
+ generateTicketId());
})
.onErrorResume(e -> {
log.error("[Agent] 会话 {} 处理异常: {}", sessionId, e.getMessage());
return Flux.just("非常抱歉,处理过程中遇到了问题。是否需要转接人工客服?");
});
}
private String buildEnrichedInput(String userInput, Map<String, String> slots) {
if (slots.isEmpty()) return userInput;
StringBuilder sb = new StringBuilder(userInput);
sb.append("\n[系统补充信息]");
slots.forEach((k, v) -> sb.append(String.format(" %s=%s", k, v)));
return sb.toString();
}
private String generateTicketId() {
return "TK" + System.currentTimeMillis();
}
}
七、WebSocket 长连接与会话管理
7.1 WebSocket 处理器
java
@Component
@Slf4j
public class CustomerServiceWebSocketHandler implements WebSocketHandler {
private final RequestRouter requestRouter;
private final SessionManager sessionManager;
private final HumanEscalationService escalationService;
// 维护所有活跃连接(sessionId -> WebSocketSession)
private final ConcurrentHashMap<String, WebSocketSession> activeSessions =
new ConcurrentHashMap<>();
@Override
public Mono<Void> handle(WebSocketSession wsSession) {
String sessionId = extractSessionId(wsSession);
String userId = extractUserId(wsSession);
log.info("[WS] 新连接建立: sessionId={}, userId={}", sessionId, userId);
// 注册连接
activeSessions.put(sessionId, wsSession);
sessionManager.createSession(sessionId, userId);
// 发送欢迎消息
String welcomeMsg = buildWelcomeMessage(userId);
return wsSession.send(
// 处理入站消息
wsSession.receive()
.map(WebSocketMessage::getPayloadAsText)
.flatMap(payload -> handleMessage(payload, sessionId, wsSession))
// 合并欢迎消息(立即推送)
.mergeWith(Mono.just(wsSession.textMessage(welcomeMsg)))
)
.doFinally(signal -> {
log.info("[WS] 连接断开: sessionId={}, signal={}", sessionId, signal);
activeSessions.remove(sessionId);
sessionManager.closeSession(sessionId);
});
}
private Flux<WebSocketMessage> handleMessage(String payload, String sessionId,
WebSocketSession wsSession) {
try {
ChatMessage chatMessage = parseMessage(payload);
ConversationContext context = sessionManager.getContext(sessionId);
// 推送"正在输入"状态
wsSession.send(Mono.just(wsSession.textMessage(
buildTypingMessage(sessionId)))).subscribe();
// 路由处理,获取流式响应
return requestRouter.route(chatMessage.getContent(), sessionId, context)
.map(chunk -> buildResponseMessage(sessionId, chunk))
.map(wsSession::textMessage)
// 附加结束标记
.concatWith(Mono.just(wsSession.textMessage(
buildEndMessage(sessionId))));
} catch (Exception e) {
log.error("[WS] 消息处理异常: sessionId={}", sessionId, e);
String errMsg = buildErrorMessage(sessionId, "消息处理失败,请重试");
return Flux.just(wsSession.textMessage(errMsg));
}
}
/**
* 主动推送消息(用于人工客服加入时的通知)
*/
public void pushMessageToSession(String sessionId, String content) {
WebSocketSession session = activeSessions.get(sessionId);
if (session != null && session.isOpen()) {
String message = buildSystemMessage(sessionId, content);
session.send(Mono.just(session.textMessage(message))).subscribe();
}
}
private String buildWelcomeMessage(String userId) {
return String.format("""
{"type":"welcome","content":"您好!我是智能客服小助手,
很高兴为您服务。请问有什么可以帮您的?","timestamp":%d}
""", System.currentTimeMillis());
}
private String buildResponseMessage(String sessionId, String chunk) {
return String.format(
"{\"type\":\"message\",\"sessionId\":\"%s\",\"content\":\"%s\",\"timestamp\":%d}",
sessionId, escapeJson(chunk), System.currentTimeMillis()
);
}
}
7.2 分布式 Session 管理
java
@Service
@Slf4j
public class SessionManager {
private final RedisTemplate<String, Object> redisTemplate;
private static final Duration SESSION_TTL = Duration.ofHours(2);
private static final String SESSION_KEY_PREFIX = "cs:session:";
/**
* 创建会话并初始化上下文
*/
public ConversationContext createSession(String sessionId, String userId) {
ConversationContext context = ConversationContext.builder()
.sessionId(sessionId)
.userId(userId)
.startTime(Instant.now())
.state(SessionState.ACTIVE)
.slots(new HashMap<>())
.messageCount(0)
.isHandledByAI(true)
.build();
saveContext(sessionId, context);
log.info("[Session] 创建会话: sessionId={}, userId={}", sessionId, userId);
return context;
}
/**
* 更新会话上下文(每次对话后调用)
*/
public void updateContext(String sessionId, Consumer<ConversationContext> updater) {
ConversationContext context = getContext(sessionId);
updater.accept(context);
context.setLastActiveTime(Instant.now());
context.setMessageCount(context.getMessageCount() + 1);
saveContext(sessionId, context);
}
public ConversationContext getContext(String sessionId) {
String key = SESSION_KEY_PREFIX + sessionId;
Object value = redisTemplate.opsForValue().get(key);
if (value == null) {
// 会话不存在,创建默认上下文(防止 NPE)
return ConversationContext.defaultContext(sessionId);
}
return (ConversationContext) value;
}
private void saveContext(String sessionId, ConversationContext context) {
String key = SESSION_KEY_PREFIX + sessionId;
redisTemplate.opsForValue().set(key, context, SESSION_TTL);
}
public void closeSession(String sessionId) {
updateContext(sessionId, ctx -> {
ctx.setState(SessionState.CLOSED);
ctx.setEndTime(Instant.now());
});
}
/**
* 获取用户在所有设备的活跃会话(多端同步)
*/
public List<String> getUserActiveSessions(String userId) {
String userSessionKey = "cs:user_sessions:" + userId;
Set<Object> sessionIds = redisTemplate.opsForSet().members(userSessionKey);
return sessionIds == null ? List.of() :
sessionIds.stream().map(Object::toString).collect(Collectors.toList());
}
}
7.3 WebSocket 配置
java
@Configuration
@EnableWebSocket
public class WebSocketConfig implements WebSocketConfigurer {
private final CustomerServiceWebSocketHandler handler;
@Override
public void registerWebSocketHandlers(WebSocketHandlerRegistry registry) {
registry.addHandler(handler, "/ws/customer-service")
.setAllowedOrigins("*") // 生产环境配置具体域名
.withSockJS(); // 降级支持(IE等老浏览器)
}
}
八、熔断降级与人工坐席切换
这是整个系统最关键的可靠性设计。AI 服务出问题时,用户体验不能崩。
8.1 熔断器配置
java
@Component
@Slf4j
public class AIServiceCircuitBreaker {
private final RequestRouter requestRouter;
private final HumanEscalationService escalationService;
private final MeterRegistry meterRegistry;
/**
* 带熔断保护的 AI 服务调用
* 熔断打开时自动切换到人工坐席
*/
@CircuitBreaker(name = "aiService", fallbackMethod = "fallbackToHuman")
@TimeLimiter(name = "aiService")
@Bulkhead(name = "aiService", type = Bulkhead.Type.THREADPOOL)
public CompletableFuture<Flux<String>> callAIService(
String userInput, String sessionId, ConversationContext context) {
return CompletableFuture.supplyAsync(() ->
requestRouter.route(userInput, sessionId, context)
);
}
/**
* 熔断后的降级处理
* 自动转接人工坐席 + 告知用户
*/
public CompletableFuture<Flux<String>> fallbackToHuman(
String userInput, String sessionId, ConversationContext context,
Exception ex) {
log.warn("[熔断] AI服务不可用,会话 {} 降级到人工客服. 原因: {}",
sessionId, ex.getMessage());
// 记录熔断事件
meterRegistry.counter("circuit.breaker.fallback",
"service", "ai",
"reason", ex.getClass().getSimpleName())
.increment();
// 异步转人工
return CompletableFuture.supplyAsync(() -> {
boolean transferred = escalationService.transferAsync(sessionId, context);
String message = transferred
? "您好,当前智能助手正在维护中,已为您接通人工客服,请稍等..."
: buildQueueMessage(sessionId);
return Flux.just(message);
});
}
private String buildQueueMessage(String sessionId) {
int queuePosition = escalationService.getQueuePosition(sessionId);
return String.format(
"您好,人工客服正在繁忙中,您当前排队位置:第 %d 位。" +
"预计等待时间:%d 分钟。您也可以留下联系方式,我们将主动联系您。",
queuePosition, queuePosition * 3
);
}
}
8.2 人工坐席调度服务
java
@Service
@Slf4j
public class HumanEscalationService {
private final AgentPoolManager agentPool;
private final SessionManager sessionManager;
private final CustomerServiceWebSocketHandler wsHandler;
private final KafkaTemplate<String, Object> kafkaTemplate;
/**
* 同步转人工(可立即接通)
*/
public Flux<String> transferToHuman(String sessionId, ConversationContext context) {
Agent availableAgent = agentPool.findAvailableAgent(context.getCategory());
if (availableAgent != null) {
// 有空闲坐席,立即接通
assignAgent(sessionId, availableAgent, context);
return Flux.just(
String.format("已为您接通人工客服 %s,请问有什么可以帮您?",
availableAgent.getDisplayName())
);
} else {
// 无空闲坐席,进入排队
int position = addToQueue(sessionId, context);
// 注册坐席可用事件监听
registerQueueCallback(sessionId);
return Flux.just(
String.format("当前人工客服繁忙,您已进入排队队列,排队序号:%d。" +
"您可以继续与智能助手对话,有空闲坐席时系统会自动为您接通。",
position)
);
}
}
/**
* 异步转人工(熔断降级时调用)
*/
public boolean transferAsync(String sessionId, ConversationContext context) {
try {
// 发布转人工事件到 Kafka,由调度服务异步处理
TransferEvent event = TransferEvent.builder()
.sessionId(sessionId)
.userId(context.getUserId())
.reason(TransferReason.AI_FALLBACK)
.priority(context.isVipUser() ? Priority.HIGH : Priority.NORMAL)
.conversationSummary(generateSummary(context))
.timestamp(Instant.now())
.build();
kafkaTemplate.send("customer-service-transfer", sessionId, event);
// 更新会话状态
sessionManager.updateContext(sessionId, ctx ->
ctx.setState(SessionState.WAITING_HUMAN));
return true;
} catch (Exception e) {
log.error("[人工转接] 发送转接事件失败: {}", e.getMessage());
return false;
}
}
/**
* 获取排队位置
*/
public int getQueuePosition(String sessionId) {
return agentPool.getQueuePosition(sessionId);
}
/**
* 坐席接单时:推送历史对话摘要(帮助坐席快速了解情况)
*/
public void onAgentAccept(String sessionId, String agentId) {
ConversationContext context = sessionManager.getContext(sessionId);
String summary = generateSummary(context);
// 推送摘要给坐席端
wsHandler.pushMessageToSession("agent-" + agentId,
buildHandoverMessage(sessionId, summary));
// 通知用户
wsHandler.pushMessageToSession(sessionId,
"人工客服已接入,将为您继续服务。");
// 更新会话状态
sessionManager.updateContext(sessionId, ctx -> {
ctx.setState(SessionState.HUMAN_HANDLING);
ctx.setAssignedAgentId(agentId);
ctx.setIsHandledByAI(false);
});
}
/**
* 生成对话摘要(用于人工坐席快速了解背景)
*/
private String generateSummary(ConversationContext context) {
return String.format(
"用户ID: %s | 意图: %s | 已收集槽位: %s | 对话轮次: %d | 开始时间: %s",
context.getUserId(),
context.getLastIntent(),
context.getSlots(),
context.getMessageCount(),
context.getStartTime()
);
}
}
九、运营后台:监控、审计与大屏
9.1 对话审计服务
java
@Service
@Slf4j
public class ConversationAuditService {
private final ConversationRepository repository;
private final KafkaTemplate<String, Object> kafkaTemplate;
/**
* 记录每条对话消息(异步,不影响主链路性能)
*/
@Async
public void auditMessage(String sessionId, String userId,
MessageRole role, String content,
IntentResult intent, long latencyMs) {
ConversationRecord record = ConversationRecord.builder()
.sessionId(sessionId)
.userId(userId)
.role(role)
.content(desensitizeContent(content)) // 脱敏处理
.intent(intent != null ? intent.getIntent() : null)
.confidence(intent != null ? intent.getConfidence() : null)
.latencyMs(latencyMs)
.timestamp(Instant.now())
.build();
repository.save(record);
// 发送到 Kafka 供实时分析
kafkaTemplate.send("conversation-audit", sessionId, record);
}
/**
* 内容脱敏:手机号、身份证、银行卡号
*/
private String desensitizeContent(String content) {
if (content == null) return null;
return content
.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2") // 手机号
.replaceAll("(\\d{6})\\d{8}(\\d{4})", "$1********$2") // 身份证
.replaceAll("(\\d{4})\\d{8,12}(\\d{4})", "$1****$2"); // 银行卡
}
}
9.2 满意度分析 API
java
@RestController
@RequestMapping("/api/analytics")
@Slf4j
public class AnalyticsController {
private final AnalyticsService analyticsService;
private final MeterRegistry meterRegistry;
/**
* 大屏数据:实时统计
*/
@GetMapping("/dashboard/realtime")
public RealtimeDashboard getRealtimeDashboard() {
return RealtimeDashboard.builder()
.activeSessions(analyticsService.getActiveSessionCount())
.todayTotalConversations(analyticsService.getTodayConversationCount())
.aiResolvedRate(analyticsService.getAIResolvedRate())
.avgResponseLatency(analyticsService.getAvgResponseLatency())
.humanQueueLength(analyticsService.getHumanQueueLength())
.topIntents(analyticsService.getTopIntents(10))
.satisfactionScore(analyticsService.getAvgSatisfactionScore())
.hotKnowledgeItems(analyticsService.getHotKnowledgeItems(5))
.timestamp(Instant.now())
.build();
}
/**
* 知识库热度分析:哪些问题被问得最多
*/
@GetMapping("/knowledge/heatmap")
public List<KnowledgeHeatItem> getKnowledgeHeatmap(
@RequestParam(defaultValue = "7") int days,
@RequestParam(defaultValue = "20") int topN) {
return analyticsService.getKnowledgeHeatmap(days, topN);
}
/**
* 意图分布分析
*/
@GetMapping("/intent/distribution")
public IntentDistribution getIntentDistribution(
@RequestParam @DateTimeFormat(iso = DateTimeFormat.ISO.DATE) LocalDate startDate,
@RequestParam @DateTimeFormat(iso = DateTimeFormat.ISO.DATE) LocalDate endDate) {
return analyticsService.getIntentDistribution(startDate, endDate);
}
/**
* 满意度趋势
*/
@GetMapping("/satisfaction/trend")
public List<SatisfactionDataPoint> getSatisfactionTrend(
@RequestParam(defaultValue = "30") int days) {
return analyticsService.getSatisfactionTrend(days);
}
/**
* 用户满意度反馈提交
*/
@PostMapping("/satisfaction/submit")
public ResponseEntity<Void> submitSatisfaction(
@RequestBody SatisfactionFeedback feedback) {
analyticsService.saveFeedback(feedback);
// 低分对话触发二次审核
if (feedback.getScore() <= 2) {
analyticsService.flagForReview(feedback.getSessionId());
}
return ResponseEntity.ok().build();
}
}
9.3 Prometheus 监控指标
java
@Component
@Slf4j
public class CustomerServiceMetrics {
private final MeterRegistry registry;
// 自定义指标
private final Counter aiCallCounter;
private final Counter humanTransferCounter;
private final Timer responseTimer;
private final Gauge activeSessionGauge;
private final DistributionSummary tokenUsageSummary;
public CustomerServiceMetrics(MeterRegistry registry,
SessionManager sessionManager) {
this.registry = registry;
this.aiCallCounter = Counter.builder("cs.ai.calls.total")
.description("AI服务调用总次数")
.tag("service", "customer-service")
.register(registry);
this.humanTransferCounter = Counter.builder("cs.human.transfer.total")
.description("转人工总次数")
.register(registry);
this.responseTimer = Timer.builder("cs.response.latency")
.description("AI回复延迟")
.publishPercentiles(0.5, 0.9, 0.99)
.register(registry);
this.activeSessionGauge = Gauge.builder("cs.sessions.active",
sessionManager, SessionManager::getActiveSessionCount)
.description("当前活跃会话数")
.register(registry);
this.tokenUsageSummary = DistributionSummary.builder("cs.token.usage")
.description("Token 消耗统计")
.baseUnit("tokens")
.register(registry);
}
public void recordAICall(String intent, long latencyMs, boolean success) {
aiCallCounter.increment();
responseTimer.record(latencyMs, TimeUnit.MILLISECONDS);
registry.counter("cs.ai.calls.by_intent",
"intent", intent,
"status", success ? "success" : "error")
.increment();
}
public void recordHumanTransfer(TransferReason reason) {
humanTransferCounter.increment();
registry.counter("cs.human.transfer.by_reason",
"reason", reason.name()).increment();
}
public void recordTokenUsage(String model, int inputTokens, int outputTokens) {
tokenUsageSummary.record(inputTokens + outputTokens);
registry.counter("cs.token.usage.by_model",
"model", model, "type", "input").increment(inputTokens);
registry.counter("cs.token.usage.by_model",
"model", model, "type", "output").increment(outputTokens);
}
}
9.4 Prometheus 告警规则
yaml
# prometheus/alerts/customer-service.yml
groups:
- name: customer-service-alerts
rules:
# AI 服务错误率过高
- alert: AIServiceHighErrorRate
expr: |
rate(cs_ai_calls_total{status="error"}[5m]) /
rate(cs_ai_calls_total[5m]) > 0.1
for: 2m
labels:
severity: critical
annotations:
summary: "AI服务错误率超过10%"
description: "当前错误率: {{ $value | humanizePercentage }}"
# 平均响应延迟过高
- alert: AIResponseLatencyHigh
expr: histogram_quantile(0.95, cs_response_latency_seconds_bucket) > 10
for: 3m
labels:
severity: warning
annotations:
summary: "AI响应P95延迟超过10秒"
# 人工转接量异常增加(可能是AI质量下降)
- alert: HumanTransferRateSpike
expr: |
rate(cs_human_transfer_total[10m]) >
rate(cs_human_transfer_total[1h] offset 1d) * 2
for: 5m
labels:
severity: warning
annotations:
summary: "人工转接量异常增加,请检查AI服务质量"
# 活跃会话数过多(容量告警)
- alert: TooManyActiveSessions
expr: cs_sessions_active > 1000
for: 1m
labels:
severity: warning
annotations:
summary: "活跃会话数超过阈值: {{ $value }}"
十、Spring Cloud Gateway 网关配置
java
@Configuration
public class GatewayConfig {
@Bean
public RouteLocator customerServiceRoutes(RouteLocatorBuilder builder) {
return builder.routes()
// WebSocket 路由
.route("session-ws", r -> r
.path("/ws/**")
.uri("lb:ws://session-service"))
// AI 服务路由(带限流)
.route("ai-service", r -> r
.path("/api/ai/**")
.filters(f -> f
.requestRateLimiter(config -> config
.setRateLimiter(redisRateLimiter())
.setKeyResolver(userKeyResolver()))
.circuitBreaker(config -> config
.setName("aiServiceBreaker")
.setFallbackUri("forward:/fallback/ai")))
.uri("lb://ai-service"))
// 知识库管理(仅内部访问)
.route("knowledge-service", r -> r
.path("/api/knowledge/**")
.filters(f -> f
.addRequestHeader("X-Internal-Call", "true"))
.uri("lb://knowledge-service"))
// 分析服务路由
.route("analytics-service", r -> r
.path("/api/analytics/**")
.uri("lb://analytics-service"))
.build();
}
@Bean
public RedisRateLimiter redisRateLimiter() {
// 每用户每秒最多 10 次请求,突发允许 20 次
return new RedisRateLimiter(10, 20, 1);
}
@Bean
public KeyResolver userKeyResolver() {
// 按用户 ID 限流,未登录按 IP 限流
return exchange -> {
String userId = exchange.getRequest().getHeaders()
.getFirst("X-User-Id");
if (userId != null) return Mono.just(userId);
return Mono.just(
exchange.getRequest().getRemoteAddress().getAddress().getHostAddress()
);
};
}
}
十一、端到端调用链路走一遍
把整个流程串起来看看:
用户发送消息:"我的订单1234567890123还没到,在哪儿了"
│
▼
[Gateway] Token 校验 → 限流检查 → 路由到 session-service
│
▼
[WebSocket Handler] 接收消息,解析 payload
│
▼
[Circuit Breaker] Resilience4j 检查熔断状态
│(CLOSED 状态,正常调用)
▼
[Intent Recognition]
→ 识别意图: ORDER_QUERY,置信度 0.97
→ 提取槽位: order_id=1234567890123
│
▼
[Request Router] 意图=ORDER_QUERY → 走 Agent 链路
│
▼
[CustomerServiceAgent]
→ 思考:用户要查物流,需要调用 queryLogistics 工具
→ 调用 queryLogistics("1234567890123")
→ 工具返回:【江苏南京,已到达分拣中心,预计明日送达】
→ 生成回复(流式输出)
│
▼
[WebSocket Handler] 逐 chunk 推送给用户
[Session Manager] 更新会话上下文,记录消息
[Audit Service] 异步写入对话审计日志
[Metrics] 记录调用耗时、Token 消耗
用户收到:"您的订单 1234567890123 目前在【江苏南京分拣中心】,
预计明天(3月21日)可以送达,请注意查收。
如果超时未到,您可以联系我申请催件 ^^"
十二、生产部署注意事项
12.1 JVM 调优
bash
# AI 服务 JVM 参数(处理大量 Reactor 异步流)
JAVA_OPTS="-server \
-Xms2g -Xmx4g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:+UseStringDeduplication \
-Dreactor.netty.pool.maxConnections=500 \
-Dreactor.netty.http.server.accessLogEnabled=true"
12.2 Docker Compose 快速启动
yaml
# docker-compose.yml(开发/测试环境)
version: '3.8'
services:
redis:
image: redis:7-alpine
ports: ["6379:6379"]
command: redis-server --maxmemory 512mb --maxmemory-policy allkeys-lru
milvus:
image: milvusdb/milvus:v2.4.0
ports: ["19530:19530"]
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
ai-service:
build: ./ai-service
ports: ["8081:8081"]
environment:
DASHSCOPE_API_KEY: ${DASHSCOPE_API_KEY}
REDIS_HOST: redis
MILVUS_HOST: milvus
depends_on: [redis, milvus]
deploy:
resources:
limits:
memory: 4G
12.3 核心性能指标参考
| 指标 | 目标值 | 说明 |
|---|---|---|
| 意图识别延迟 | < 500ms (P99) | 使用 qwen-turbo 轻量模型 |
| RAG 问答延迟 | < 3s (P95) | 包含向量检索 + 大模型推理 |
| Agent 工具调用 | < 8s (P95) | 含 1-2 次工具调用 |
| 并发会话数 | 500+ | 单节点 |
| 知识库入库速度 | 1000条/分钟 | 批量导入 |
| 熔断恢复时间 | 30s | 半开状态探测 |
十三、常见问题与调优经验
Q:RAG 检索到的文档和问题不相关,怎么办?
先检查 similarityThreshold 是否设置太低(建议客服场景 0.72+)。其次检查文档切分粒度,512 tokens 的 chunk 对于 FAQ 问答比较合适,但产品手册可能需要更小的 256 tokens。最后检查 embedding 模型是否和向量库里的一致------如果换过模型,一定要重新全量入库。
Q:意图识别置信度低,频繁走兜底逻辑?
优化意图识别 Prompt,给每种意图加几个典型例句(few-shot)。另外,温度参数设为 0.0 对意图识别至关重要,保证输出确定性。如果业务场景复杂,可以考虑用 fine-tuned 的专有模型替代通用模型做意图分类。
Q:WebSocket 连接在 Nginx 后面老断?
Nginx 需要配置 proxy_read_timeout 3600s 和 proxy_send_timeout 3600s,同时在客户端每 30 秒发送一次心跳消息,防止连接被中间件超时断开。
Q:Token 消耗太大,成本怎么控制?
意图识别切换到 qwen-turbo(约为 qwen-max 成本的 1/10)。RAG 的 system prompt 精简到 200 字以内。对话历史窗口控制在 10-15 条,超出部分用摘要替代。另外对于简单的 FAQ 问答,可以加一层语义缓存(Redis),相似度 > 0.98 的问题直接命中缓存,不走大模型。
总结
这篇文章把 Spring AI Alibaba 1.1 的核心能力全部落地到了一个真实的智能客服系统中:
意图识别(槽位填充)
↓ 路由
RAG 知识库问答 ←→ Agent 工具调用
↓ ↓
流式输出(WebSocket)
↓
熔断降级 → 人工坐席
↓
Micrometer 监控 → Prometheus → Grafana 大屏
每个模块都是独立可测试的,整体通过 Spring Cloud 微服务解耦。实际项目落地时,可以根据业务规模选择性裁剪------小团队可以先把 AI 层和 Session 层合并,等规模上来再拆分。
系列文章到这里就全部完成了。从快速入门到企业级实战,希望这 15 篇文章能帮你在 Java 生态里把 AI 能力真正用起来。不是在玩 Demo,而是在造产品。
版本说明 :本文所有代码基于 Spring AI Alibaba 1.1.2.2 + Spring Boot 3.3.5 + JDK 17 编写,已在生产类似架构中验证。DashScope API Key 申请:https://dashscope.aliyun.com