Spring AI Alibaba 可观测性实践:AI应用监控与链路追踪
导读:AI 应用上生产,少不了完善的可观测体系。Token 消耗直接影响成本,响应延迟影响用户体验,一旦出现问题却找不到根因,那只能靠猜。本文基于 Micrometer + Prometheus + OpenTelemetry,构建覆盖指标、链路、日志的 AI 应用全链路可观测方案。
一、AI 应用可观测的特殊性
传统 Web 应用的监控主要关注 QPS、延迟、错误率这三个维度。AI 应用在此基础上新增了几个独特维度:
传统 Web 应用监控:
QPS / 延迟 / 错误率 / 资源使用率
AI 应用额外关注:
Token 消耗(直接对应费用)
模型响应质量(幻觉率、相关性)
不同模型的使用分布
Prompt/Response 内容审计
工具调用成功率
RAG 检索命中率
其中 Token 消耗是最独特的------它是直接的业务成本,必须精细化监控。
二、依赖配置
xml
<!-- Spring Boot Actuator -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Micrometer + Prometheus -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- OpenTelemetry 链路追踪 -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
<!-- Spring AI 内置可观测支持 -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-autoconfigure-model-observability</artifactId>
</dependency>
三、Actuator 基础配置
yaml
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus,loggers
endpoint:
health:
show-details: always
# 健康检查包含 AI 模型可用性
show-components: always
prometheus:
enabled: true
metrics:
export:
prometheus:
enabled: true
tags:
# 全局标签:所有指标都携带应用名和环境
application: ${spring.application.name}
environment: ${spring.profiles.active:unknown}
# 链路追踪采样率(1.0 = 100% 采样,生产建议 0.1)
spring:
ai:
chat:
# 开启 AI 调用的 Micrometer 指标收集
observations:
enabled: true
# 是否在 Span 中记录 Prompt 内容(生产环境建议 false,防止隐私泄露)
include-prompt: false
# 是否记录模型完整响应
include-completion: false
四、Token 消耗指标:成本监控的核心
4.1 Spring AI 内置的 Token 指标
Spring AI 1.1 版已内置了 Token 相关指标,通过 ChatModelMeterBinder 自动采集:
# 核心指标名称(Micrometer 格式)
gen_ai.client.token.usage # Token 消耗量
tags:
ai.operation.name = "chat"
gen_ai.response.model = "qwen-turbo"
gen_ai.token.type = "input"/"output"/"total"
gen_ai.client.operation.duration # 模型调用耗时(秒)
tags:
ai.operation.name = "chat"
gen_ai.response.model = "qwen-turbo"
gen_ai.request.model = "qwen-turbo"
gen_ai.client.operation.error # 错误次数
tags:
error.type = "TimeoutException"
访问 /actuator/prometheus 可以看到这些指标:
# HELP gen_ai_client_token_usage_tokens
# TYPE gen_ai_client_token_usage_tokens counter
gen_ai_client_token_usage_tokens_total{
ai_operation_name="chat",
gen_ai_response_model="qwen-turbo",
gen_ai_token_type="input"
} 15234.0
gen_ai_client_token_usage_tokens_total{
ai_operation_name="chat",
gen_ai_response_model="qwen-turbo",
gen_ai_token_type="output"
} 8921.0
4.2 自定义成本计费指标
在内置指标的基础上,添加成本维度:
java
@Component
@RequiredArgsConstructor
@Slf4j
public class AiCostMeterBinder implements MeterBinder {
private final MeterRegistry registry;
// 各模型 Token 单价(元/千Token)
private static final Map<String, Double> INPUT_PRICE = Map.of(
"qwen-turbo", 0.002,
"qwen-plus", 0.004,
"qwen-max", 0.04,
"qwen-long", 0.0005,
"gpt-4o-mini", 0.03
);
private static final Map<String, Double> OUTPUT_PRICE = Map.of(
"qwen-turbo", 0.006,
"qwen-plus", 0.012,
"qwen-max", 0.12,
"qwen-long", 0.002,
"gpt-4o-mini", 0.09
);
// 累计成本(实际项目用 AtomicLong 存微元,避免浮点精度问题)
private final Map<String, Counter> costCounters = new ConcurrentHashMap<>();
@Override
public void bindTo(@NonNull MeterRegistry registry) {
// 注册成本 Gauge(每 30 秒刷新一次)
Gauge.builder("ai.cost.estimate.yuan", this, AiCostMeterBinder::getTotalCost)
.description("AI 调用估算总成本(元)")
.register(registry);
}
/**
* 记录一次模型调用的 Token 成本
*/
public void recordUsage(String model, long inputTokens, long outputTokens) {
double inputCost = inputTokens / 1000.0 *
INPUT_PRICE.getOrDefault(model, 0.01);
double outputCost = outputTokens / 1000.0 *
OUTPUT_PRICE.getOrDefault(model, 0.02);
Counter.builder("ai.cost.tokens")
.tag("model", model)
.tag("token_type", "input")
.description("Token 输入消耗量")
.register(registry)
.increment(inputTokens);
Counter.builder("ai.cost.tokens")
.tag("model", model)
.tag("token_type", "output")
.register(registry)
.increment(outputTokens);
log.info("[Cost] 模型={}, 输入Token={}, 输出Token={}, 本次费用=¥{:.6f}",
model, inputTokens, outputTokens, inputCost + outputCost);
}
private double getTotalCost() {
// 实际从数据库或缓存读取累计成本
return 0.0;
}
}
五、链路追踪:Brave/OpenTelemetry 集成
5.1 OpenTelemetry 配置
yaml
management:
tracing:
sampling:
probability: 1.0 # 开发环境 100%,生产建议 0.1
otel:
exporter:
otlp:
# 发送到 Jaeger / SkyWalking / Zipkin 等
endpoint: http://localhost:4318
protocol: http/protobuf
traces:
sampler:
type: parentbased_traceidratio
ratio: 0.1
5.2 TraceId 透传到 AI 调用
java
@Slf4j
@Component
public class TraceAwareAdvisor implements RequestResponseAdvisor {
@Override
public AdvisedRequest adviseRequest(AdvisedRequest request,
Map<String, Object> context) {
// 获取当前链路的 TraceId,注入到上下文
Span currentSpan = Span.current();
if (currentSpan != null && currentSpan.getSpanContext().isValid()) {
String traceId = currentSpan.getSpanContext().getTraceId();
context.put("traceId", traceId);
log.debug("[Trace] AI 调用,traceId={}", traceId);
}
return request;
}
@Override
public ChatResponse adviseResponse(ChatResponse response,
Map<String, Object> context) {
// 在 Span 上添加 AI 调用的自定义属性
Span currentSpan = Span.current();
if (currentSpan != null && response.getMetadata() != null) {
var usage = response.getMetadata().getUsage();
if (usage != null) {
// 在链路 Span 上记录 Token 消耗
currentSpan.setAttribute("gen_ai.input_tokens", usage.getPromptTokens());
currentSpan.setAttribute("gen_ai.output_tokens", usage.getGenerationTokens());
}
}
return response;
}
@Override
public int getOrder() {
return Ordered.HIGHEST_PRECEDENCE;
}
}
5.3 LLM 调用 Span 自定义标记
java
@Service
@RequiredArgsConstructor
public class ObservableChatService {
private final ChatClient chatClient;
private final Tracer tracer;
/**
* 带链路追踪的 AI 调用
*/
public String tracedChat(String message, String userId) {
// 创建一个新的 Span,标记为 LLM 调用
Span span = tracer.spanBuilder("llm.chat")
.setAttribute("llm.vendor", "aliyun")
.setAttribute("llm.model", "qwen-plus")
.setAttribute("user.id", userId)
// 生产环境:Prompt 内容脱敏后记录
.setAttribute("llm.prompt.length", message.length())
.startSpan();
try (Scope scope = span.makeCurrent()) {
String response = chatClient.prompt(message).call().content();
span.setAttribute("llm.response.length", response.length());
span.setStatus(StatusCode.OK);
return response;
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
}
}
}
六、结构化日志:Prompt 脱敏与安全过滤
java
@Component
@Slf4j
public class StructuredLogAdvisor implements RequestResponseAdvisor {
// 敏感信息正则(手机号、身份证、银行卡等)
private static final Pattern PHONE_PATTERN =
Pattern.compile("1[3-9]\\d{9}");
private static final Pattern ID_CARD_PATTERN =
Pattern.compile("\\d{17}[\\dXx]");
private static final Pattern BANK_CARD_PATTERN =
Pattern.compile("\\d{16,19}");
private static final Pattern API_KEY_PATTERN =
Pattern.compile("(?i)(api[-_]?key|token|secret)[\\s:=]+[\\w-]+");
@Override
public AdvisedRequest adviseRequest(AdvisedRequest request,
Map<String, Object> context) {
// 结构化日志记录请求信息(脱敏后)
String userText = desensitize(request.userText());
log.info("""
[AI Request] model={} userPromptLength={} systemPromptLength={}
userPromptPreview='{}'
""",
request.chatOptions() != null ?
request.chatOptions().getModel() : "default",
userText.length(),
request.systemText() != null ? request.systemText().length() : 0,
truncate(userText, 100) // 只记录前100字
);
context.put("requestTime", System.currentTimeMillis());
return request;
}
@Override
public ChatResponse adviseResponse(ChatResponse response,
Map<String, Object> context) {
long duration = System.currentTimeMillis() -
(long) context.getOrDefault("requestTime", System.currentTimeMillis());
String content = response.getResult() != null ?
response.getResult().getOutput().getContent() : "";
log.info("""
[AI Response] duration={}ms contentLength={} model={}
inputTokens={} outputTokens={}
""",
duration,
content != null ? content.length() : 0,
response.getMetadata() != null ? response.getMetadata().getModel() : "unknown",
response.getMetadata() != null && response.getMetadata().getUsage() != null ?
response.getMetadata().getUsage().getPromptTokens() : 0,
response.getMetadata() != null && response.getMetadata().getUsage() != null ?
response.getMetadata().getUsage().getGenerationTokens() : 0
);
return response;
}
/**
* 敏感信息脱敏
*/
private String desensitize(String text) {
if (text == null) return "";
return text
.replaceAll(PHONE_PATTERN.pattern(), "***PHONE***")
.replaceAll(ID_CARD_PATTERN.pattern(), "***IDCARD***")
.replaceAll(BANK_CARD_PATTERN.pattern(), "***BANKCARD***")
.replaceAll(API_KEY_PATTERN.pattern(), "$1=***APIKEY***");
}
private String truncate(String text, int maxLength) {
if (text == null) return "";
return text.length() > maxLength ?
text.substring(0, maxLength) + "..." : text;
}
@Override
public int getOrder() {
return Integer.MIN_VALUE; // 最高优先级
}
}
七、健康检查:模型服务可用性探针
java
@Component
@RequiredArgsConstructor
public class AiHealthIndicator implements HealthIndicator {
private final ChatClient chatClient;
private final VectorStore vectorStore;
@Override
public Health health() {
Health.Builder builder = new Health.Builder();
// 检查 DashScope 模型可用性
boolean modelHealthy = checkModelHealth();
// 检查向量存储连通性
boolean vectorStoreHealthy = checkVectorStoreHealth();
builder.withDetail("dashscope", modelHealthy ? "UP" : "DOWN")
.withDetail("vectorStore", vectorStoreHealthy ? "UP" : "DOWN");
if (modelHealthy && vectorStoreHealthy) {
return builder.up().build();
} else if (modelHealthy) {
// 向量存储故障不影响基础对话
return builder.status("DEGRADED").build();
} else {
return builder.down().build();
}
}
private boolean checkModelHealth() {
try {
String result = chatClient.prompt("ping")
.options(DashScopeChatOptions.builder()
.withModel("qwen-turbo")
.withMaxTokens(10) // 最短响应,节省 Token
.build())
.call()
.content();
return result != null && !result.isEmpty();
} catch (Exception e) {
log.warn("模型健康检查失败:{}", e.getMessage());
return false;
}
}
private boolean checkVectorStoreHealth() {
try {
// 执行一次最小代价的检索,验证连通性
vectorStore.similaritySearch(
SearchRequest.defaults().withTopK(1));
return true;
} catch (Exception e) {
log.warn("向量存储健康检查失败:{}", e.getMessage());
return false;
}
}
}
八、Prometheus 告警规则
以下 Prometheus 告警规则覆盖了 AI 应用最常见的故障场景:
yaml
# prometheus-rules.yml
groups:
- name: ai-application-alerts
rules:
# 告警一:Token 消耗突增(可能是异常循环或攻击)
- alert: AiTokenUsageSpike
expr: |
rate(gen_ai_client_token_usage_tokens_total[5m]) > 10000
for: 2m
labels:
severity: warning
annotations:
summary: "AI Token 消耗速率异常"
description: "过去5分钟 Token 消耗速率超过10000/min,请检查是否存在异常调用"
# 告警二:AI 服务响应超时(P99 > 30秒)
- alert: AiResponseLatencyHigh
expr: |
histogram_quantile(0.99,
rate(gen_ai_client_operation_duration_seconds_bucket[5m])) > 30
for: 5m
labels:
severity: critical
annotations:
summary: "AI 服务响应延迟过高"
description: "AI 调用 P99 延迟超过30秒,可能影响用户体验"
# 告警三:AI 服务错误率超阈值
- alert: AiErrorRateHigh
expr: |
rate(gen_ai_client_operation_error_total[5m]) /
rate(gen_ai_client_operation_duration_seconds_count[5m]) > 0.05
for: 3m
labels:
severity: critical
annotations:
summary: "AI 服务错误率过高"
description: "AI 调用错误率超过5%,请检查网络和 API Key 状态"
# 告警四:日成本超预算
- alert: AiDailyCostOverBudget
expr: |
increase(ai_cost_tokens_total[24h]) * 0.001 > 1000
labels:
severity: warning
annotations:
summary: "AI 日费用超预算"
description: "今日 AI 调用费用估算已超过1000元,请检查是否存在异常"
九、Grafana 大盘核心面板
推荐 Grafana 大盘布局(共4行):
+------------------+------------------+------------------+
| 实时 QPS | P50/P99 延迟 | 错误率 |
+------------------+------------------+------------------+
| Token 消耗/分钟 | 按模型分布饼图 | 今日估算费用 |
+------------------+------------------+------------------+
| 调用次数热力图 | 工具调用成功率 | RAG 命中率 |
+------------------+------------------+------------------+
| 最近10条错误日志 | 健康检查状态 | 活跃会话数 |
+------------------+------------------+------------------+
关键 PromQL 查询:
promql
# QPS
rate(gen_ai_client_operation_duration_seconds_count[1m])
# P99 延迟
histogram_quantile(0.99, rate(gen_ai_client_operation_duration_seconds_bucket[5m]))
# Token 消耗率(按模型分组)
rate(gen_ai_client_token_usage_tokens_total[5m])
# 错误率
rate(gen_ai_client_operation_error_total[5m])
/ rate(gen_ai_client_operation_duration_seconds_count[5m])
十、总结
建立 AI 应用可观测体系的三个层次:
- 指标层(Micrometer + Prometheus):Token 消耗、调用延迟、错误率、模型分布;
- 链路层(OpenTelemetry + Jaeger/Zipkin):每次 AI 调用的完整调用链,快速定位慢查询;
- 日志层(结构化日志 + 脱敏):Prompt/Response 脱敏后的 JSON 日志,支持查询分析。
三层缺一不可,共同构成 AI 应用的"运维大脑"。
下一篇将进入 Agent 开发领域,探讨如何用 Spring AI Alibaba 1.1 版实现 ReAct、Plan-And-Execute 等智能体架构。
参考资料