SpringAI 2.0 可观测性体系:AI 操作追踪、指标监控与评估框架
摘要:在企业级 AI 应用落地过程中,可观测性是保障系统稳定性、控制成本、评估质量的核心基础设施。Spring AI 2.0 提供了完整的可观测性体系,涵盖分布式追踪、指标监控、Token 消耗分析、Advisor 链执行剖析,以及基于 LLM 的响应质量评估(EVAL)框架。本文将从资深架构师视角,深入剖析这些能力的实现原理与实战用法。
一、为什么 AI 应用需要可观测性?
传统 Java 应用的监控体系(APM)已经非常成熟,但 AI 应用带来了新的挑战:
| 维度 | 传统微服务 | AI 应用 |
|---|---|---|
| 调用延迟 | 几十毫秒级别 | 数百毫秒到数秒 |
| 成本模型 | 相对固定 | 按 Token 计费,波动大 |
| 质量评估 | 确定性结果 | 需要语义评估,存在幻觉风险 |
| 执行链路 | 同步调用 | Advisor 链式调用,复杂度高 |
一个生产级的 AI 应用,需要回答以下问题:
- 当前 AI 调用的延迟分布如何?P99 是多少?
- 每次对话消耗了多少 Token?成本是多少?
- AI 返回的答案是否可靠?是否存在幻觉?
- Advisor 链中哪个环节是性能瓶颈?
Spring AI 2.0 的可观测性体系正是为解决这些问题而设计。
二、Spring AI 可观测性整体架构
+------------------------------------------------------------------+
| Spring AI 2.0 |
+------------------------------------------------------------------+
| +----------------+ +----------------+ +--------------------+ |
| | Metrics | | Tracing | | EVAL | |
| | (Micrometer) | | (Zipkin/Jaeger)| | (Quality Assess) | |
| +--------+-------+ +--------+-------+ +---------+---------+ |
| | | | |
| v v v |
| +--------------------------------------------------+ |
| | ObservationRegistry | |
| +--------------------------------------------------+ |
+------------------------------------------------------------------+
| | |
v v v
[Prometheus] [Zipkin] [LLM Evaluator]
[Grafana] [Jaeger]
Spring AI 基于 Micrometer 的 ObservationRegistry 统一管理可观测性,所有监控数据通过标准的 Observation 接口输出,支持多种后端。
三、Token 消耗监控:企业成本控制的基础
3.1 JTokkit Token 计数
Spring AI 2.0 内置了 JTokkit Token 计数估算器,这是目前最准确的 Token 估算库之一。
java
import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.UserMessage;
// 初始化 Token 计数估算器
JTokkitTokenCountEstimator estimator = new JTokkitTokenCountEstimator();
// 计算单条消息的 Token
UserMessage userMessage = new UserMessage("请详细解释Spring Boot的自动配置原理");
int userTokens = estimator.estimate(userMessage);
System.out.println("用户消息 Token 数: " + userTokens);
// 计算 Assistant 消息的 Token
String assistantResponse = "Spring Boot 自动配置的核心原理是...";
int assistantTokens = estimator.estimate(assistantResponse);
System.out.println("Assistant 响应 Token 数: " + assistantTokens);
// 计算完整对话的 Token(包含消息历史)
List<org.springframework.ai.chat.messages.Message> messages = List.of(
new UserMessage("什么是依赖注入?"),
new AssistantMessage("依赖注入是一种设计模式..."),
new UserMessage("请详细说明"),
new AssistantMessage("详细说明如下...")
);
int totalTokens = estimator.estimate(messages);
System.out.println("对话总 Token 数: " + totalTokens);
3.2 Content 接口的统一 Token 计算
Spring AI 的 Content 接口统一了消息和文档的 Token 计算方式:
java
import org.springframework.ai.content.Content;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.chat.prompt.UserPromptTemplate;
// 自定义文档内容实现 Content 接口
public class MyDocument implements Content {
private String text;
private Map<String, Object> metadata;
// getters and setters
@Override
public String getText() {
return this.text;
}
@Override
public Map<String, Object> getMetadata() {
return this.metadata;
}
}
// 在 RAG 场景中计算检索文档的总 Token
public class TokenBudgetManager {
private final JTokkitTokenCountEstimator estimator;
private final int maxTokens;
public TokenBudgetManager(int maxTokens) {
this.estimator = new JTokkitTokenCountEstimator();
this.maxTokens = maxTokens;
}
public boolean canFitInContext(List<Content> documents) {
int totalTokens = documents.stream()
.mapToInt(doc -> estimator.estimate(doc.getText()))
.sum();
return totalTokens <= maxTokens;
}
public List<Content> fitDocuments(List<Content> documents) {
List<Content> fitting = new ArrayList<>();
int currentTokens = 0;
for (Content doc : documents) {
int docTokens = estimator.estimate(doc.getText());
if (currentTokens + docTokens <= maxTokens) {
fitting.add(doc);
currentTokens += docTokens;
} else {
break; // 超出预算,不再添加
}
}
return fitting;
}
}
四、Micrometer 指标埋点:延迟、Token、成本全景监控
4.1 Spring AI 指标自动配置
Spring AI 2.0 依赖 spring-boot-starter-actuator 和 Micrometer,自动暴露 AI 相关的指标:
xml
<!-- pom.xml 依赖配置 -->
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-spring-boot-starter</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
</dependencies>
yaml
# application.yml 配置
management:
endpoints:
web:
exposure:
include: prometheus,metrics,health
metrics:
tags:
application: ${spring.application.name}
export:
prometheus:
enabled: true
observations:
enabled: true
4.2 自定义 AI 指标埋点
在企业级场景中,我们需要自定义更细粒度的指标:
java
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.Meter;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.MessageAggregator;
import org.springframework.stereotype.Component;
import reactor.core.publisher.Flux;
import java.time.Duration;
import java.util.concurrent.atomic.AtomicLong;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
@Component
public class AIMetricsCollector {
private final MeterRegistry registry;
private final Map<String, AtomicLong> tokenUsageCache = new ConcurrentHashMap<>();
private final Map<String, Timer> callTimers = new ConcurrentHashMap<>();
private final Map<String, Counter> errorCounters = new ConcurrentHashMap<>();
// Token 价格配置 (单位: 每百万 Token)
private static final Map<String, Double> MODEL_PRICING = Map.of(
"gpt-4o", 5.0, // 输入
"gpt-4o", 15.0, // 输出
"gpt-4o-mini", 0.15,
"gpt-4o-mini", 0.6,
"claude-3-5-sonnet", 3.0,
"claude-3-5-sonnet", 15.0
);
public AIMetricsCollector(MeterRegistry registry) {
this.registry = registry;
initializeMetrics();
}
private void initializeMetrics() {
// 注册 Token 使用量 Gauge
Gauge.builder("ai.token.usage.total", tokenUsageCache,
map -> map.values().stream()
.mapToLong(AtomicLong::get)
.sum())
.tag("type", "all")
.description("Total tokens used")
.register(registry);
}
/**
* 记录 AI 调用的完整指标
*/
public void recordCall(String model, String operation,
int promptTokens, int completionTokens,
long latencyMs, boolean success) {
String modelKey = model.replace(":", "_");
// 1. 延迟指标
Timer timer = callTimers.computeIfAbsent(operation + "_" + modelKey, k ->
Timer.builder("ai.call.latency")
.tag("model", model)
.tag("operation", operation)
.description("AI call latency")
.register(registry));
timer.record(Duration.ofMillis(latencyMs));
// 2. Token 消耗指标
if (promptTokens > 0) {
Counter.builder("ai.token.prompt")
.tag("model", model)
.tag("operation", operation)
.register(registry)
.increment(promptTokens);
}
if (completionTokens > 0) {
Counter.builder("ai.token.completion")
.tag("model", model)
.tag("operation", operation)
.register(registry)
.increment(completionTokens);
}
// 3. 成本指标 (USD)
double cost = calculateCost(model, promptTokens, completionTokens);
Counter.builder("ai.cost.usd")
.tag("model", model)
.tag("operation", operation)
.register(registry)
.increment(cost);
// 4. 成功/失败计数
if (success) {
Counter.builder("ai.call.count")
.tag("model", model)
.tag("operation", operation)
.tag("status", "success")
.register(registry)
.increment();
} else {
Counter errorCounter = errorCounters.computeIfAbsent(
operation + "_" + modelKey, k ->
Counter.builder("ai.call.errors")
.tag("model", model)
.tag("operation", operation)
.description("AI call errors")
.register(registry));
errorCounter.increment();
}
}
private double calculateCost(String model, int promptTokens, int completionTokens) {
// 简化计算:实际生产应查询 MODEL_PRICING
double inputPrice = 5.0 / 1_000_000; // $5 per 1M input
double outputPrice = 15.0 / 1_000_000; // $15 per 1M output
return (promptTokens * inputPrice) + (completionTokens * outputPrice);
}
/**
* 包装 ChatModel 以自动收集指标
*/
@Component
public class InstrumentedChatModel implements ChatModel {
private final ChatModel delegate;
private final AIMetricsCollector metrics;
public InstrumentedChatModel(ChatModel delegate, AIMetricsCollector metrics) {
this.delegate = delegate;
this.metrics = metrics;
}
@Override
public ChatResponse call(Prompt prompt) {
long startTime = System.currentTimeMillis();
boolean success = false;
int promptTokens = 0;
int completionTokens = 0;
try {
ChatResponse response = delegate.call(prompt);
success = true;
// 尝试从响应中提取 Token 用量
// 实际实现需要根据不同模型适配
return response;
} catch (Exception e) {
throw e;
} finally {
long latency = System.currentTimeMillis() - startTime;
metrics.recordCall(
getDefaultModel(),
"sync_call",
promptTokens,
completionTokens,
latency,
success
);
}
}
@Override
public Flux<ChatResponse> stream(Prompt prompt) {
long startTime = System.currentTimeMillis();
return delegate.stream(prompt)
.doFinally(signalType -> {
long latency = System.currentTimeMillis() - startTime;
metrics.recordCall(
getDefaultModel(),
"stream_call",
0, 0, // 流式调用难以精确统计 Token
latency,
signalType != null
);
});
}
private String getDefaultModel() {
// 从配置中获取默认模型名称
return "gpt-4o";
}
}
}
4.3 Grafana 看板配置
以下是生产级 Grafana 看板的 JSON 配置核心部分:
json
{
"panels": [
{
"title": "AI 调用延迟分布",
"type": "timeseries",
"gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
"targets": [
{
"expr": "histogram_quantile(0.50, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
"legendFormat": "P50 - {{model}}"
},
{
"expr": "histogram_quantile(0.95, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
"legendFormat": "P95 - {{model}}"
},
{
"expr": "histogram_quantile(0.99, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
"legendFormat": "P99 - {{model}}"
}
]
},
{
"title": "Token 消耗趋势",
"type": "timeseries",
"gridPos": {"x": 12, "y": 0, "w": 12, "h": 8},
"targets": [
{
"expr": "sum(increase(ai_token_prompt_total[1h])) by (model)",
"legendFormat": "Input - {{model}}"
},
{
"expr": "sum(increase(ai_token_completion_total[1h])) by (model)",
"legendFormat": "Output - {{model}}"
}
]
},
{
"title": "AI 成本实时监控",
"type": "stat",
"gridPos": {"x": 0, "y": 8, "w": 6, "h": 4},
"targets": [
{
"expr": "sum(ai_cost_usd_total{service=\"chat-service\"})",
"legendFormat": "Total Cost"
}
],
"fieldConfig": {
"defaults": {
"unit": "currencyUSD",
"thresholds": {
"steps": [
{"value": 0, "color": "green"},
{"value": 100, "color": "yellow"},
{"value": 500, "color": "red"}
]
}
}
}
},
{
"title": "调用成功率",
"type": "gauge",
"gridPos": {"x": 6, "y": 8, "w": 6, "h": 4},
"targets": [
{
"expr": "sum(rate(ai_call_count_total{service=\"chat-service\",status=\"success\"}[5m])) / sum(rate(ai_call_count_total{service=\"chat-service\"}[5m])) * 100"
}
]
},
{
"title": "模型调用占比",
"type": "piechart",
"gridPos": {"x": 12, "y": 8, "w": 6, "h": 4},
"targets": [
{
"expr": "sum(increase(ai_call_count_total{service=\"chat-service\"}[1h])) by (model)"
}
]
}
]
}
五、分布式追踪:Zipkin/Jaeger 集成
5.1 Spring AI 追踪自动配置
Spring AI 2.0 与 Spring Cloud Sleuth/Zipkin 无缝集成:
xml
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-spring-boot-starter</artifactId>
</dependency>
</dependencies>
yaml
spring:
application:
name: chat-service
sleuth:
sampler:
probability: 1.0 # 生产环境建议 0.1-0.3
zipkin:
base-url: http://localhost:9411
sender:
type: web
5.2 自定义追踪上下文
在复杂的企业场景中,我们需要将业务上下文传递到 AI 追踪中:
java
import org.springframework.cloud.sleuth.BaggageInScope;
import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.MessageAggregator;
import reactor.core.publisher.Flux;
import java.util.Map;
import java.util.HashMap;
@Component
public class AITracingService {
private final Tracer tracer;
private final ChatModel chatModel;
public AITracingService(Tracer tracer, ChatModel chatModel) {
this.tracer = tracer;
this.chatModel = chatModel;
}
/**
* 带追踪的 AI 调用
*/
public ChatResponse callWithTrace(Prompt prompt, Map<String, String> businessContext) {
Span span = tracer.startScopedSpan("ai.chat.call");
try {
// 添加业务标签
span.tag("ai.model", "gpt-4o");
span.tag("ai.operation", "chat");
businessContext.forEach(span::tag);
// 添加 baggage (跨服务传递)
span.addEvent("start_chat_call");
long startTime = System.currentTimeMillis();
ChatResponse response = chatModel.call(prompt);
long callDuration = System.currentTimeMillis() - startTime;
// 记录耗时
span.event("chat_completed");
span.tag("ai.duration.ms", String.valueOf(callDuration));
// 记录 Token 用量(如果有)
// span.tag("ai.tokens.prompt", String.valueOf(promptTokens));
// span.tag("ai.tokens.completion", String.valueOf(completionTokens));
return response;
} catch (Exception e) {
span.error(e);
span.tag("ai.error", e.getClass().getSimpleName());
throw e;
} finally {
span.end();
}
}
/**
* 带追踪的流式调用
*/
public Flux<ChatResponse> streamWithTrace(Prompt prompt) {
Span span = tracer.startScopedSpan("ai.chat.stream");
span.tag("ai.model", "gpt-4o");
span.tag("ai.operation", "stream");
return chatModel.stream(prompt)
.doOnComplete(() -> span.event("stream_completed"))
.doOnError(e -> span.error(e))
.doFinally(signal -> span.end());
}
/**
* 跨服务追踪:在调用外部 AI 服务时传递 traceId
*/
public Map<String, String> getTracingHeaders() {
Map<String, String> headers = new HashMap<>();
// 获取当前 trace 和 span 信息
Span currentSpan = tracer.currentSpan();
if (currentSpan != null) {
headers.put("X-B3-TraceId", currentSpan.context().traceId());
headers.put("X-B3-SpanId", currentSpan.context().spanId());
headers.put("X-B3-ParentSpanId", currentSpan.context().parentId());
}
// 获取 baggage
try (BaggageInScope baggage = tracer.getBaggage("user-id")) {
if (baggage != null) {
headers.put("X-User-Id", baggage.getValue());
}
}
return headers;
}
}
5.3 Advisor 链追踪
Advisor 链是 Spring AI 2.0 的核心特性,每个 Advisor 的执行都可以被追踪:
java
import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.chat.client.advisor.RetrievalAugmentationAdvisor;
import org.springframework.ai.chat.model.MessageAggregator;
import org.springframework.ai.vectorstore.VectorStore;
import reactor.core.publisher.Flux;
/**
* 带追踪的 Advisor 链配置
*/
@Configuration
public class TracedAdvisorConfiguration {
@Bean
public Advisor tracedChatMemoryAdvisor(ChatMemory chatMemory) {
// 包装原始 Advisor 以添加追踪
return new TracingAdvisor(
MessageChatMemoryAdvisor.builder(chatMemory)
.build(),
"chat_memory"
);
}
@Bean
public Advisor tracedQuestionAnswerAdvisor(VectorStore vectorStore) {
return new TracingAdvisor(
QuestionAnswerAdvisor.builder(vectorStore)
.build(),
"question_answer"
);
}
/**
* 追踪装饰器
*/
static class TracingAdvisor implements Advisor {
private final Advisor delegate;
private final String name;
public TracingAdvisor(Advisor delegate, String name) {
this.delegate = delegate;
this.name = name;
}
@Override
public ChatResponse advise(ChatContext context, ChatResponse response) {
long startTime = System.currentTimeMillis();
Span span = tracer.startScopedSpan("advisor." + name);
try {
span.tag("advisor.type", name);
ChatResponse result = delegate.advise(context, response);
span.tag("advisor.execution.success", "true");
return result;
} catch (Exception e) {
span.error(e);
throw e;
} finally {
span.tag("advisor.duration.ms",
String.valueOf(System.currentTimeMillis() - startTime));
span.end();
}
}
// delegate other methods to delegate...
}
}
六、Advisor 链执行耗时剖析与瓶颈诊断
6.1 Advisor 链执行原理
+------------------------------------------------------------------+
| User Request |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| [1] SafeGuardAdvisor (内容安全检查) ~5ms |
| - 敏感词检测 |
| - 安全策略拦截 |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| [2] ReReading Advisor (Re2) ~10ms |
| - 提示词增强 |
| - 指令强化 |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| [3] ChatMemoryAdvisor (对话历史管理) ~2ms |
| - 历史消息加载 |
| - Token 预算控制 |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| [4] QuestionAnswerAdvisor (RAG 检索增强) ~100-500ms |
| - 向量检索 |
| - 上下文组装 |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| [5] ToolCallAdvisor (工具调用管理) ~50-2000ms |
| - 函数选择 |
| - 执行调用 |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| [6] LastMaxTokenSizeContentPurger ~1ms |
| - Token 上限管理 |
| - 上下文裁剪 |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| [7] ChatModel (LLM 调用) ~500-3000ms |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| Response |
+------------------------------------------------------------------+
6.2 Advisor 链性能监控
java
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.*;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
/**
* Advisor 链执行耗时监控
*/
@Component
public class AdvisorChainProfiler {
private final MeterRegistry registry;
private final Map<String, Timer> advisorTimers = new ConcurrentHashMap<>();
public AdvisorChainProfiler(MeterRegistry registry) {
this.registry = registry;
}
/**
* 监控单个 Advisor 的执行时间
*/
public void recordAdvisorExecution(String advisorName, long durationMs) {
Timer timer = advisorTimers.computeIfAbsent(advisorName, name ->
Timer.builder("advisor.execution.time")
.tag("advisor", name)
.description("Advisor execution time")
.register(registry)
);
timer.record(Duration.ofMillis(durationMs));
}
/**
* 创建带有性能监控的 Advisor 链
*/
public List<Advisor> createProfiledAdvisorChain(
List<Advisor> originalAdvisors,
MeterRegistry registry) {
List<Advisor> profiled = new ArrayList<>();
for (int i = 0; i < originalAdvisors.size(); i++) {
Advisor original = originalAdvisors.get(i);
String advisorName = getAdvisorName(original);
int order = i;
profiled.add(new ProfiledAdvisor(original, advisorName, order, this));
}
return profiled;
}
private String getAdvisorName(Advisor advisor) {
// 从 Advisor 类名提取
String name = advisor.getClass().getSimpleName();
if (name.contains("$")) {
name = name.split("\\$")[0];
}
return name;
}
/**
* 带性能监控的 Advisor 包装器
*/
static class ProfiledAdvisor implements Advisor {
private final Advisor delegate;
private final String name;
private final int order;
private final AdvisorChainProfiler profiler;
public ProfiledAdvisor(Advisor delegate, String name,
int order, AdvisorChainProfiler profiler) {
this.delegate = delegate;
this.name = name;
this.order = order;
this.profiler = profiler;
}
@Override
public ChatResponse advise(ChatContext context, ChatResponse response) {
long startTime = System.nanoTime();
try {
return delegate.advise(context, response);
} finally {
long durationMs = (System.nanoTime() - startTime) / 1_000_000;
profiler.recordAdvisorExecution(name, durationMs);
// 慢 Advisor 告警
if (durationMs > 100) {
// 记录慢 Advisor 日志
System.out.println("SLOW ADVISOR: " + name +
" took " + durationMs + "ms");
}
}
}
@Override
public Flux<ChatResponse> adviseStream(ChatContext context,
Flux<ChatResponse> responseStream) {
long startTime = System.nanoTime();
return responseStream
.doFinally(signal -> {
long durationMs = (System.nanoTime() - startTime) / 1_000_000;
profiler.recordAdvisorExecution(name + "_stream", durationMs);
});
}
@Override
public int getOrder() {
return delegate.getOrder();
}
}
}
6.3 瓶颈诊断与优化建议
java
import java.util.*;
import java.util.stream.Collectors;
/**
* Advisor 链瓶颈分析
*/
@Component
public class AdvisorBottleneckAnalyzer {
/**
* 分析 Advisor 链的性能瓶颈
*/
public BottleneckReport analyzeChain(List<AdvisorMetrics> metrics) {
// 计算总耗时
long totalTime = metrics.stream()
.mapToLong(AdvisorMetrics::getDurationMs)
.sum();
// 计算各 Advisor 占比
List<AdvisorAnalysis> analysisList = metrics.stream()
.map(m -> {
double percentage = (m.getDurationMs() * 100.0) / totalTime;
return new AdvisorAnalysis(
m.getName(),
m.getDurationMs(),
percentage,
getSuggestion(m.getName(), percentage)
);
})
.sorted(Comparator.comparingDouble(AdvisorAnalysis::getPercentage).reversed())
.collect(Collectors.toList());
// 识别瓶颈
List<String> bottlenecks = analysisList.stream()
.filter(a -> a.getPercentage() > 30)
.map(AdvisorAnalysis::getName)
.collect(Collectors.toList());
return new BottleneckReport(analysisList, totalTime, bottlenecks);
}
private String getSuggestion(String advisorName, double percentage) {
return switch (advisorName.toLowerCase()) {
case "questionansweradvisor" ->
"建议优化向量检索:检查索引构建、使用缓存、调整 topK";
case "toolcalladvisor" ->
"建议:减少工具定义数量、优化工具执行逻辑";
case "chatmemoryadvisor" ->
"建议:使用 Redis 分布式缓存、减少历史窗口";
case "chatmodel" ->
"建议:考虑使用更快的模型、启用流式响应";
default -> "建议持续监控,关注性能变化趋势";
};
}
static class AdvisorMetrics {
private String name;
private long durationMs;
public String getName() { return name; }
public long getDurationMs() { return durationMs; }
}
static class AdvisorAnalysis {
private String name;
private long durationMs;
private double percentage;
private String suggestion;
public AdvisorAnalysis(String name, long durationMs,
double percentage, String suggestion) {
this.name = name;
this.durationMs = durationMs;
this.percentage = percentage;
this.suggestion = suggestion;
}
public String getName() { return name; }
public long getDurationMs() { return durationMs; }
public double getPercentage() { return percentage; }
}
static class BottleneckReport {
private List<AdvisorAnalysis> analysis;
private long totalTimeMs;
private List<String> bottlenecks;
public BottleneckReport(List<AdvisorAnalysis> analysis,
long totalTimeMs, List<String> bottlenecks) {
this.analysis = analysis;
this.totalTimeMs = totalTimeMs;
this.bottlenecks = bottlenecks;
}
// getters...
}
}
七、EVAL 框架:基于 LLM 的响应质量评估
7.1 EVAL 框架概述
Spring AI 2.0 提供了 EVAL 框架,用于自动评估 AI 响应的质量:
+------------------------------------------------------------------+
| EVAL 评估框架 |
+------------------------------------------------------------------+
| |
| +------------+ +-------------+ +---------------+ |
| | 用户原始 | | LLM | | 评估 | |
| | Prompt |----->| 响应 |----->| 结果 | |
| +------------+ +-------------+ +---------------+ |
| | | | |
| v v v |
| +------------+ +-------------+ +---------------+ |
| | 参考答案 | | 评分标准 | | 评估报告 | |
| | (可选) | | (Criteria) | | (分数+理由) | |
| +------------+ +-------------+ +---------------+ |
| |
+------------------------------------------------------------------+
7.2 响应质量评估实现
java
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.chat.prompt.UserPromptTemplate;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;
/**
* AI 响应质量评估器
*/
@Component
public class ResponseQualityEvaluator {
private final ChatModel chatModel;
public ResponseQualityEvaluator(ChatModel chatModel) {
this.chatModel = chatModel;
}
/**
* 综合质量评估
*/
public EvaluationResult evaluate(String question, String response,
String referenceAnswer) {
// 1. 准确性评估
double accuracyScore = evaluateAccuracy(question, response, referenceAnswer);
// 2. 完整性评估
double completenessScore = evaluateCompleteness(question, response);
// 3. 幻觉检测
HallucinationResult hallucination = detectHallucination(question, response, referenceAnswer);
// 4. 毒性检测
double toxicityScore = evaluateToxicity(response);
// 综合评分
double overallScore = (accuracyScore * 0.4 +
completenessScore * 0.3 +
(hallucination.isPassed() ? 1.0 : 0.0) * 0.2 +
toxicityScore * 0.1);
return new EvaluationResult(
accuracyScore,
completenessScore,
hallucination,
toxicityScore,
overallScore,
overallScore >= 0.7 ? "PASS" : "FAIL"
);
}
/**
* 准确性评估
*/
private double evaluateAccuracy(String question, String response,
String referenceAnswer) {
String evaluationPrompt = String.format("""
你是一个专业的 AI 评估专家。请评估以下 AI 回答的准确性。
问题:%s
AI 回答:%s
参考答案:%s
请从以下维度进行评分:
1. 答案是否正确回答了问题?
2. 答案与参考答案的一致性如何?
3. 是否有事实性错误?
请返回 JSON 格式的评分结果:
{
"score": 0.0-1.0,
"reason": "评分理由",
"errors": ["错误1", "错误2"]
}
""", question, response, referenceAnswer);
try {
ChatResponse chatResponse = chatModel.call(
new Prompt(new UserMessage(evaluationPrompt))
);
return parseScore(chatResponse.getResult().getOutput().getText());
} catch (Exception e) {
return 0.5; // 默认中等分数
}
}
/**
* 完整性评估
*/
private double evaluateCompleteness(String question, String response) {
String evaluationPrompt = String.format("""
请评估以下回答的完整性。
问题:%s
回答:%s
评估维度:
1. 是否覆盖了问题的所有方面?
2. 解释是否足够详细?
3. 是否有遗漏的重要信息?
返回 JSON:
{
"score": 0.0-1.0,
"missing_aspects": ["缺失方面1", "缺失方面2"],
"reason": "评分理由"
}
""", question, response);
try {
ChatResponse chatResponse = chatModel.call(
new Prompt(new UserMessage(evaluationPrompt))
);
return parseScore(chatResponse.getResult().getOutput().getText());
} catch (Exception e) {
return 0.5;
}
}
/**
* 幻觉检测
*/
private HallucinationResult detectHallucination(String question, String response,
String referenceAnswer) {
String prompt = String.format("""
你是一个事实核查专家。请检测以下回答中是否存在幻觉(虚构的、不可验证的信息)。
问题:%s
回答:%s
参考信息:%s
幻觉类型:
1. 事实性错误 - 与已知事实不符
2. 不可验证 - 无法确认真假
3. 过度推断 - 从前提推导出不必然的结论
4. 虚构引用 - 不存在的引用或来源
返回 JSON:
{
"is_hallucination": true/false,
"hallucination_type": "factual/unverifiable/over_claimed/fabricated/null",
"confidence": 0.0-1.0,
"problematic_sentences": ["问题句子1", "问题句子2"],
"explanation": "解释说明"
}
""", question, response, referenceAnswer != null ? referenceAnswer : "无参考");
try {
ChatResponse chatResponse = chatModel.call(
new Prompt(new UserMessage(prompt))
);
return parseHallucinationResult(chatResponse.getResult().getOutput().getText());
} catch (Exception e) {
return new HallucinationResult(false, "error", 0.0,
List.of(), "评估失败,使用默认结果");
}
}
/**
* 毒性检测
*/
private double evaluateToxicity(String response) {
// 简单的关键词检测 + LLM 评估
// 实际生产应使用专门的毒性检测模型
List<String> toxicPatterns = List.of(
"暴力", "仇恨", "歧视", "违法", "犯罪",
"赌博", "毒品", "色情", "自杀", "自残"
);
boolean containsToxic = toxicPatterns.stream()
.anyMatch(p -> response.contains(p));
if (containsToxic) {
return 0.0;
}
// 调用 LLM 进行细粒度检测
String toxicityPrompt = String.format("""
请评估以下内容是否包含毒性信息。
内容:%s
返回 JSON:
{
"is_toxic": true/false,
"toxicity_level": 0.0-1.0,
"toxic_categories": ["类别1"],
"reason": "理由"
}
""", response);
try {
ChatResponse chatResponse = chatModel.call(
new Prompt(new UserMessage(toxicityPrompt))
);
return 1.0 - parseToxicity(chatResponse.getResult().getOutput().getText());
} catch (Exception e) {
return 1.0;
}
}
private double parseScore(String json) {
// 简化解析,实际应使用 Jackson
try {
var pattern = Pattern.compile("\"score\"\\s*:\\s*([0-9.]+)");
var matcher = pattern.matcher(json);
if (matcher.find()) {
return Double.parseDouble(matcher.group(1));
}
} catch (Exception e) {
// ignore
}
return 0.5;
}
private HallucinationResult parseHallucinationResult(String json) {
try {
boolean isHallucination = json.contains("\"is_hallucination\" : true") ||
json.contains("\"is_hallucination\": true");
return new HallucinationResult(
isHallucination,
"factual",
0.8,
List.of(),
"parsed from response"
);
} catch (Exception e) {
return new HallucinationResult(false, "unknown", 0.0, List.of(), "parse error");
}
}
private double parseToxicity(String json) {
try {
var pattern = Pattern.compile("\"toxicity_level\"\\s*:\\s*([0-9.]+)");
var matcher = pattern.matcher(json);
if (matcher.find()) {
return Double.parseDouble(matcher.group(1));
}
} catch (Exception e) {
// ignore
}
return 0.0;
}
// 内部类定义
public static class EvaluationResult {
private final double accuracyScore;
private final double completenessScore;
private final HallucinationResult hallucination;
private final double toxicityScore;
private final double overallScore;
private final String passStatus;
public EvaluationResult(double accuracyScore, double completenessScore,
HallucinationResult hallucination, double toxicityScore,
double overallScore, String passStatus) {
this.accuracyScore = accuracyScore;
this.completenessScore = completenessScore;
this.hallucination = hallucination;
this.toxicityScore = toxicityScore;
this.overallScore = overallScore;
this.passStatus = passStatus;
}
// getters
public double getAccuracyScore() { return accuracyScore; }
public double getCompletenessScore() { return completenessScore; }
public HallucinationResult getHallucination() { return hallucination; }
public double getToxicityScore() { return toxicityScore; }
public double getOverallScore() { return overallScore; }
public String getPassStatus() { return passStatus; }
}
public static class HallucinationResult {
private final boolean passed;
private final String type;
private final double confidence;
private final List<String> problematicSentences;
private final String explanation;
public HallucinationResult(boolean passed, String type, double confidence,
List<String> problematicSentences, String explanation) {
this.passed = passed;
this.type = type;
this.confidence = confidence;
this.problematicSentences = problematicSentences;
this.explanation = explanation;
}
public boolean isPassed() { return passed; }
public String getType() { return type; }
public double getConfidence() { return confidence; }
public List<String> getProblematicSentences() { return problematicSentences; }
public String getExplanation() { return explanation; }
}
}
7.3 实战:构建企业级幻觉检测系统
java
import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.messages.AssistantMessage;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.regex.Pattern;
/**
* 企业级幻觉检测系统
*/
@Component
public class EnterpriseHallucinationDetector {
private final ChatModel chatModel;
private final VectorStore referenceStore;
private final Map<String, ClaimCache> claimCache = new ConcurrentHashMap<>();
// 事实库:用于快速验证常见事实
private final Map<String, Boolean> factualDatabase = new ConcurrentHashMap<>();
public EnterpriseHallucinationDetector(ChatModel chatModel,
VectorStore referenceStore) {
this.chatModel = chatModel;
this.referenceStore = referenceStore;
initializeFactualDatabase();
}
/**
* 多层幻觉检测
*/
public HallucinationReport detect(String question, String response) {
List<HallucinationIssue> issues = new ArrayList<>();
// 第一层:规则检测(快速)
List<HallucinationIssue> ruleIssues = ruleBasedDetection(response);
issues.addAll(ruleIssues);
// 第二层:事实库验证
List<HallucinationIssue> factIssues = factualVerification(response);
issues.addAll(factIssues);
// 第三层:语义检索验证
if (referenceStore != null) {
List<HallucinationIssue> semanticIssues = semanticVerification(question, response);
issues.addAll(semanticIssues);
}
// 第四层:LLM 深度分析
List<HallucinationIssue> llmIssues = llmDeepAnalysis(question, response);
issues.addAll(llmIssues);
// 计算综合置信度
double confidence = calculateConfidence(issues);
return new HallucinationReport(
issues.isEmpty(),
confidence,
issues,
response.length()
);
}
/**
* 第一层:规则检测
*/
private List<HallucinationIssue> ruleBasedDetection(String response) {
List<HallucinationIssue> issues = new ArrayList<>();
// 检测绝对性表述
Pattern absolutePattern = Pattern.compile(
"(绝对|肯定|必然|100%|所有|永远|从不|不可否认|毫无疑问)"
);
if (absolutePattern.matcher(response).find()) {
issues.add(new HallucinationIssue(
"absolute_statement",
"检测到绝对性表述,可能存在过度推断",
0.3,
"warning"
));
}
// 检测虚构引用
Pattern citationPattern = Pattern.compile(
"(《[^》]+》|出自|根据|研究表明|数据显示|据[^\\s]+报道)"
);
if (citationPattern.matcher(response).find()) {
// 需要进一步验证引用真实性
issues.add(new HallucinationIssue(
"citation_found",
"检测到引用表述,需验证真实性",
0.2,
"info"
));
}
// 检测模糊表述
Pattern vaguePattern = Pattern.compile(
"(可能|也许|大概|似乎|据说|据说|有人|某些人)"
);
if (vaguePattern.matcher(response).find()) {
issues.add(new HallucinationIssue(
"vague_statement",
"检测到模糊表述,信息来源不明确",
0.2,
"info"
));
}
return issues;
}
/**
* 第二层:事实库验证
*/
private List<HallucinationIssue> factualVerification(String response) {
List<HallucinationIssue> issues = new ArrayList<>();
// 提取关键claims并验证
List<String> claims = extractClaims(response);
for (String claim : claims) {
if (factualDatabase.containsKey(claim)) {
boolean isFact = factualDatabase.get(claim);
if (!isFact) {
issues.add(new HallucinationIssue(
"fact_check_failed",
"声称 '" + claim + "' 与已知事实不符",
0.9,
"error"
));
}
}
}
return issues;
}
/**
* 第三层:语义检索验证
*/
private List<HallucinationIssue> semanticVerification(String question, String response) {
List<HallucinationIssue> issues = new ArrayList<>();
try {
// 检索相关文档
List<Document> relevantDocs = referenceStore.similaritySearch(
SearchRequest.query(response).withTopK(3)
);
if (relevantDocs.isEmpty()) {
issues.add(new HallucinationIssue(
"no_reference_found",
"未找到相关参考文档支持该回答",
0.5,
"warning"
));
} else {
// 计算支持度
double support = calculateSupport(response, relevantDocs);
if (support < 0.3) {
issues.add(new HallucinationIssue(
"low_support",
String.format("参考文档支持度仅为 %.0f%%", support * 100),
0.7,
"warning"
));
}
}
} catch (Exception e) {
// 检索失败,记录日志
}
return issues;
}
/**
* 第四层:LLM 深度分析
*/
private List<HallucinationIssue> llmDeepAnalysis(String question, String response) {
List<HallucinationIssue> issues = new ArrayList<>();
String analysisPrompt = String.format("""
你是一个专业的幻觉检测专家。请分析以下回答是否存在幻觉。
原始问题:%s
回答内容:%s
请识别:
1. 事实性错误(与客观事实不符)
2. 不可验证的声称(无法确认真假)
3. 过度推断(从给定前提得出不必然的结论)
4. 内部矛盾(回答内部逻辑不一致)
返回严格的 JSON 格式:
{
"has_hallucination": true/false,
"issues": [
{
"type": "factual_error/unverifiable/over_claim/contradiction",
"description": "问题描述",
"evidence": "证据或原文引用",
"severity": "high/medium/low"
}
],
"overall_confidence": 0.0-1.0,
"summary": "一句话总结"
}
""", question, response);
try {
ChatResponse chatResponse = chatModel.call(
new Prompt(new UserMessage(analysisPrompt))
);
// 解析LLM返回的JSON并转换为issues
// 简化处理
String result = chatResponse.getResult().getOutput().getText();
if (result.contains("\"has_hallucination\" : true") ||
result.contains("\"has_hallucination\": true")) {
issues.add(new HallucinationIssue(
"llm_detected",
"LLM 检测到潜在幻觉",
0.6,
"warning"
));
}
} catch (Exception e) {
// LLM 调用失败
}
return issues;
}
private List<String> extractClaims(String text) {
// 简化:按句子提取
List<String> claims = new ArrayList<>();
String[] sentences = text.split("[。!?]");
for (String sentence : sentences) {
if (sentence.length() > 10 && sentence.length() < 100) {
claims.add(sentence.trim());
}
}
return claims;
}
private double calculateSupport(String response, List<Document> documents) {
// 简化:基于关键词重叠度
Set<String> responseWords = new HashSet<>(Arrays.asList(response.split("\\s+")));
double totalOverlap = 0;
for (Document doc : documents) {
Set<String> docWords = new HashSet<>(Arrays.asList(doc.getText().split("\\s+")));
Set<String> overlap = new HashSet<>(responseWords);
overlap.retainAll(docWords);
totalOverlap += (double) overlap.size() / responseWords.size();
}
return totalOverlap / documents.size();
}
private double calculateConfidence(List<HallucinationIssue> issues) {
if (issues.isEmpty()) {
return 0.0;
}
double total = issues.stream()
.mapToDouble(i -> i.severity.equals("error") ? 1.0 :
i.severity.equals("warning") ? 0.6 : 0.3)
.sum();
return Math.min(total, 1.0);
}
private void initializeFactualDatabase() {
// 初始化常见事实(实际生产应从数据库加载)
factualDatabase.put("太阳从东方升起", true);
factualDatabase.put("水的化学式是H2O", true);
factualDatabase.put("地球围绕太阳转", true);
}
// 内部类
public static class HallucinationReport {
private final boolean passed;
private final double confidence;
private final List<HallucinationIssue> issues;
private final int responseLength;
public HallucinationReport(boolean passed, double confidence,
List<HallucinationIssue> issues, int responseLength) {
this.passed = passed;
this.confidence = confidence;
this.issues = issues;
this.responseLength = responseLength;
}
public boolean isPassed() { return passed; }
public double getConfidence() { return confidence; }
public List<HallucinationIssue> getIssues() { return issues; }
}
public static class HallucinationIssue {
private final String type;
private final String description;
private final double severity;
private final String severityLevel;
public HallucinationIssue(String type, String description,
double severity, String severityLevel) {
this.type = type;
this.description = description;
this.severity = severity;
this.severityLevel = severityLevel;
}
public String getType() { return type; }
public String getDescription() { return description; }
public double getSeverity() { return severity; }
}
}
7.4 EVAL 集成到 Advisor 链
java
import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.ai.chat.model.ChatContext;
import org.springframework.ai.chat.model.ChatResponse;
/**
* EVAL 评估 Advisor
*/
@Component
public class EvaluationAdvisor implements Advisor {
private final ResponseQualityEvaluator evaluator;
private final EnterpriseHallucinationDetector hallucinationDetector;
private final double qualityThreshold;
public EvaluationAdvisor(ResponseQualityEvaluator evaluator,
EnterpriseHallucinationDetector hallucinationDetector,
double qualityThreshold) {
this.evaluator = evaluator;
this.hallucinationDetector = hallucinationDetector;
this.qualityThreshold = qualityThreshold;
}
@Override
public ChatResponse advise(ChatContext context, ChatResponse response) {
// 获取用户问题
String question = extractQuestion(context);
String answer = extractAnswer(response);
// 1. 幻觉检测
EnterpriseHallucinationDetector.HallucinationReport hallucinationReport =
hallucinationDetector.detect(question, answer);
if (!hallucinationReport.isPassed()) {
// 记录警告
System.out.println("HALLUCINATION DETECTED: " +
hallucinationReport.getConfidence());
// 决策:拒绝/警告/重新生成
if (hallucinationReport.getConfidence() > 0.8) {
// 高置信度幻觉,标记响应
response.getResult().getOutput().getText();
// 可以添加元数据标记
}
}
// 2. 质量评估(可选,异步执行)
// 实际生产应异步处理,避免影响响应延迟
return response;
}
private String extractQuestion(ChatContext context) {
if (context != null && !context.getMessages().isEmpty()) {
return context.getMessages().get(0).getText();
}
return "";
}
private String extractAnswer(ChatResponse response) {
return response.getResult().getOutput().getText();
}
@Override
public int getOrder() {
return Integer.MAX_VALUE; // 最后执行
}
}
八、完整集成示例
8.1 一站式可观测性配置
java
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
/**
* Spring AI 可观测性自动配置
*/
@Configuration
public class AIObservabilityConfiguration {
@Bean
@ConfigurationProperties(prefix = "ai.observability")
public AIObservabilityProperties observabilityProperties() {
return new AIObservabilityProperties();
}
@Bean
public AIMetricsCollector aiMetricsCollector(
MeterRegistry registry,
AIObservabilityProperties properties) {
return new AIMetricsCollector(registry, properties);
}
@Bean
public AdvisorChainProfiler advisorChainProfiler(MeterRegistry registry) {
return new AdvisorChainProfiler(registry);
}
@Bean
public ResponseQualityEvaluator responseQualityEvaluator(ChatModel chatModel) {
return new ResponseQualityEvaluator(chatModel);
}
@Bean
public EnterpriseHallucinationDetector enterpriseHallucinationDetector(
ChatModel chatModel,
VectorStore vectorStore) {
return new EnterpriseHallucinationDetector(chatModel, vectorStore);
}
@Bean
public EvaluationAdvisor evaluationAdvisor(
ResponseQualityEvaluator evaluator,
EnterpriseHallucinationDetector detector,
AIObservabilityProperties properties) {
return new EvaluationAdvisor(evaluator, detector,
properties.getQualityThreshold());
}
// 配置属性类
static class AIObservabilityProperties {
private boolean enabled = true;
private double qualityThreshold = 0.7;
private boolean enableEvaluation = false;
private boolean enableHallucinationDetection = true;
// getters and setters
public boolean isEnabled() { return enabled; }
public void setEnabled(boolean enabled) { this.enabled = enabled; }
public double getQualityThreshold() { return qualityThreshold; }
public void setQualityThreshold(double qualityThreshold) {
this.qualityThreshold = qualityThreshold;
}
public boolean isEnableEvaluation() { return enableEvaluation; }
public void setEnableEvaluation(boolean enableEvaluation) {
this.enableEvaluation = enableEvaluation;
}
public boolean isEnableHallucinationDetection() {
return enableHallucinationDetection;
}
public void setEnableHallucinationDetection(boolean enableHallucinationDetection) {
this.enableHallucinationDetection = enableHallucinationDetection;
}
}
}
8.2 application.yml 完整配置
yaml
spring:
application:
name: ai-chat-service
ai:
observability:
enabled: true
quality-threshold: 0.7
enable-evaluation: true
enable-hallucination-detection: true
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o
vectorstore:
redis:
url: redis://localhost:6379
index-name: ai-docs
# 监控配置
management:
endpoints:
web:
exposure:
include: prometheus,metrics,health,info
metrics:
tags:
application: ${spring.application.name}
export:
prometheus:
enabled: true
observations:
enabled: true
tracing:
sampling:
probability: 0.3
# Zipkin 配置
spring.zipkin:
base-url: http://localhost:9411
sender:
type: web
# 自定义指标配置
ai:
observability:
metrics:
latency-percentiles: 0.5,0.95,0.99
token-price:
gpt-4o-input: 5.0
gpt-4o-output: 15.0
gpt-4o-mini-input: 0.15
gpt-4o-mini-output: 0.6
九、总结与最佳实践
9.1 可观测性实施路线图
阶段一:基础监控(1-2周)
├── 集成 Micrometer + Prometheus
├── 采集基础延迟、错误指标
└── 配置 Grafana 看板
阶段二:成本控制(2-3周)
├── 接入 Token 计数
├── 实现成本计算
└── 配置预算告警
阶段三:链路追踪(2-3周)
├── 集成 Zipkin/Jaeger
├── 自定义业务标签
└── Advisor 链追踪
阶段四:质量保障(3-4周)
├── 实现 EVAL 评估框架
├── 部署幻觉检测系统
└── 建立质量门禁
9.2 关键指标告警规则
| 指标 | 告警阈值 | 说明 |
|---|---|---|
| ai.call.latency.p99 | > 5000ms | LLM 调用超时 |
| ai.call.errors.rate | > 5% | 调用失败率过高 |
| ai.cost.hourly | > $100 | 小时成本超预算 |
| advisor.execution.time | > 1000ms | 单个 Advisor 慢 |
| hallucination.confidence | > 0.8 | 高置信度幻觉 |
9.3 架构师视角的思考
企业级 AI 应用的可观测性建设,本质上是在解决三个核心问题:
- 可观测性 -> 可控性:通过监控延迟、错误率、成本,确保 AI 服务在可控范围内运行
- 可观测性 -> 质量保障:通过 EVAL 框架和幻觉检测,把控 AI 输出的质量底线
- 可观测性 -> 持续优化:通过 Advisor 链剖析,识别性能瓶颈,指导架构优化
Spring AI 2.0 的可观测性体系与 Spring 生态深度集成,对于已有 Spring Boot 技术栈的团队来说,是构建企业级 AI 应用的最佳选择。
记住:AI 应用的可观测性不是锦上添花,而是生产落地的必备基础设施。在追求 AI 能力的同时,必须同步建设与之匹配的可观测性能力。
参考资源:
- Spring AI 官方文档:https://docs.spring.io/spring-ai/reference/
- Micrometer 官方文档:https://micrometer.io/docs
- JTokkit GitHub:https://github.com/knuddelsgmbh/jtokkit
- Grafana 官方模板库:https://grafana.com/grafana/dashboards/