一、为什么AI应用需要专门的可观测性?
在生产环境中,AI应用面临独特的可观测性挑战:
-
模型性能波动:同一模型在不同时间点的表现可能差异巨大
-
成本不可控:Token消耗、API调用费用可能意外飙升
-
响应时间不稳定:受网络、模型负载等多因素影响
-
数据质量影响:输入数据质量直接影响输出结果可靠性
可观测性的三大支柱:
-
指标(Metrics):量化性能、成本、质量关键指标
-
日志(Logging):记录详细执行过程,便于问题排查
-
追踪(Tracing):端到端请求链路分析,定位瓶颈
二、Spring AI可观测性架构设计
2.1 整体监控架构
graph TB
A[用户请求] --> B[Spring AI应用]
B --> C[指标收集]
B --> D[日志记录]
B --> E[链路追踪]
C --> F[Prometheus]
D --> G[ELK Stack]
E --> H[Jaeger]
F --> I[Grafana看板]
G --> I
H --> I
I --> J[告警通知]
2.2 核心依赖配置
<!-- pom.xml -->
<dependencies>
<!-- Spring Boot Actuator -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Micrometer Prometheus -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- Sleuth分布式追踪 -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<!-- Zipkin链路追踪 -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin</artifactId>
</dependency>
</dependencies>
# application.yml - 可观测性配置
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
metrics:
enabled: true
prometheus:
enabled: true
metrics:
export:
prometheus:
enabled: true
distribution:
percentiles: [0.5, 0.95, 0.99]
spring:
sleuth:
sampler:
probability: 1.0 # 全量采样,生产环境可调整
zipkin:
base-url: http://localhost:9411
三、指标监控体系实战
3.1 核心业务指标定义
@Component
public class AiMetricsConfig {
private final MeterRegistry meterRegistry;
// 定义关键业务指标
private final Counter requestCounter;
private final Timer responseTimer;
private final DistributionTokenDistribution;
private final Gauge costGauge;
public AiMetricsConfig(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
// 请求计数器
this.requestCounter = Counter.builder("ai.requests.total")
.description("AI请求总数")
.tag("type", "chat")
.register(meterRegistry);
// 响应时间计时器
this.responseTimer = Timer.builder("ai.response.time")
.description("AI请求响应时间")
.publishPercentiles(0.5, 0.95, 0.99)
.register(meterRegistry);
// Token分布统计
this.tokenDistribution = DistributionSummary.builder("ai.tokens.usage")
.description("Token使用量分布")
.baseUnit("tokens")
.register(meterRegistry);
}
// 记录模型调用指标
public void recordModelInvocation(String modelName, long durationMs,
int promptTokens, int completionTokens) {
// 记录基础指标
requestCounter.increment();
responseTimer.record(durationMs, TimeUnit.MILLISECONDS);
tokenDistribution.record(promptTokens + completionTokens);
// 记录模型特定指标
Counter.builder("ai.requests.by.model")
.tag("model", modelName)
.register(meterRegistry)
.increment();
// 记录成本指标(估算)
double cost = calculateCost(modelName, promptTokens, completionTokens);
Metrics.gauge("ai.cost.estimated",
Tags.of("model", modelName),
cost);
}
private double calculateCost(String modelName, int promptTokens, int completionTokens) {
// 国内模型定价(元/千Token)
Map<String, Double> pricing = Map.of(
"qwen-max", 0.12,
"doubao-pro", 0.08,
"deepseek-chat", 0.06,
"glm-4", 0.10
);
double rate = pricing.getOrDefault(modelName, 0.1);
return (promptTokens + completionTokens) * rate / 1000;
}
}
3.2 模型性能监控切面
@Aspect
@Component
public class ModelMonitoringAspect {
private final AiMetricsConfig metricsConfig;
@Around("@within(org.springframework.ai.chat.ChatClient) || " +
"execution(* org.springframework.ai.chat.ChatClient+.*(..))")
public Object monitorModelCalls(ProceedingJoinPoint joinPoint) throws Throwable {
String modelName = extractModelName(joinPoint);
long startTime = System.currentTimeMillis();
try {
Object result = joinPoint.proceed();
// 记录成功指标
long duration = System.currentTimeMillis() - startTime;
recordSuccessMetrics(modelName, duration, result);
return result;
} catch (Exception e) {
// 记录失败指标
recordErrorMetrics(modelName, e);
throw e;
}
}
private void recordSuccessMetrics(String modelName, long duration, Object result) {
if (result instanceof ChatResponse response) {
int promptTokens = extractPromptTokens(response);
int completionTokens = extractCompletionTokens(response);
metricsConfig.recordModelInvocation(
modelName, duration, promptTokens, completionTokens
);
}
}
private void recordErrorMetrics(String modelName, Exception e) {
Counter.builder("ai.requests.errors")
.tag("model", modelName)
.tag("error_type", e.getClass().getSimpleName())
.register(Metrics.globalRegistry)
.increment();
}
}
四、分布式链路追踪实战
4.1 请求链路追踪配置
@Configuration
public class TracingConfig {
@Bean
public CurrentTraceContext currentTraceContext() {
return ThreadLocalCurrentTraceContext.newBuilder()
.build();
}
@Bean
public Sampler alwaysSampler() {
return Sampler.ALWAYS_SAMPLE;
}
}
// 自定义追踪器
@Component
public class AiRequestTracer {
private final Tracer tracer;
public AiRequestTracer(Tracer tracer) {
this.tracer = tracer;
}
public Span startAiRequestSpan(String operation, String model) {
return tracer.nextSpan()
.name(operation)
.tag("ai.model", model)
.tag("ai.operation", operation)
.start();
}
public void recordTokenUsage(Span span, int promptTokens, int completionTokens) {
span.tag("ai.prompt_tokens", String.valueOf(promptTokens));
span.tag("ai.completion_tokens", String.valueOf(completionTokens));
span.tag("ai.total_tokens",
String.valueOf(promptTokens + completionTokens));
}
public void recordModelResponse(Span span, String responseSummary) {
span.tag("ai.response_summary",
responseSummary.substring(0, Math.min(100, responseSummary.length())));
span.event("ai.response.completed");
}
}
4.2 链路追踪集成示例
@RestController
@RequestMapping("/api/chat")
public class ChatController {
private final ChatClient chatClient;
private final AiRequestTracer aiTracer;
@PostMapping
public ResponseEntity<ChatResponse> chat(
@RequestBody ChatRequest request,
@RequestHeader(value = "X-User-Id", required = false) String userId) {
Span span = aiTracer.startAiRequestSpan("chat_completion", "qwen-max");
try (SpanInScope ws = tracer.withSpanInScope(span)) {
// 记录请求信息
span.tag("user.id", userId != null ? userId : "anonymous");
span.tag("request.length", String.valueOf(request.getMessage().length()));
// 执行AI调用
ChatResponse response = chatClient.call(request.getMessage());
// 记录响应信息
aiTracer.recordTokenUsage(span,
extractPromptTokens(response),
extractCompletionTokens(response));
aiTracer.recordModelResponse(span,
response.getResult().getOutput().getContent());
return ResponseEntity.ok(response);
} catch (Exception e) {
span.error(e);
span.tag("error", "true");
throw e;
} finally {
span.finish();
}
}
}
五、智能告警与自适应调控
5.1 多维度告警规则
# alert-rules.yml
groups:
- name: ai-monitoring
rules:
- alert: HighErrorRate
expr: rate(ai_requests_errors_total[5m]) / rate(ai_requests_total[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "AI服务错误率过高"
description: "错误率超过10%,当前值: {{ $value }}"
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(ai_response_time_seconds_bucket[5m])) > 10
for: 3m
labels:
severity: critical
annotations:
summary: "AI服务响应时间过长"
description: "P95响应时间超过10秒,当前值: {{ $value }}s"
- alert: CostSpike
expr: rate(ai_cost_estimated[1h]) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "AI服务成本异常飙升"
description: "小时成本超过100元,当前值: {{ $value }}元"
- alert: ModelPerformanceDegradation
expr: ai_requests_success_rate < 0.8
for: 10m
labels:
severity: critical
annotations:
summary: "模型性能严重下降"
description: "成功率低于80%,当前值: {{ $value }}"
5.2 自适应调控策略
@Service
public class AdaptiveControlService {
private final MeterRegistry meterRegistry;
private final Map<String, CircuitBreaker> circuitBreakers = new ConcurrentHashMap<>();
@Scheduled(fixedRate = 30000) // 每30秒检查一次
public void adjustModelRouting() {
// 获取各模型实时指标
Map<String, ModelMetrics> metrics = getRealTimeMetrics();
metrics.forEach((model, metric) -> {
// 如果错误率过高,触发熔断
if (metric.getErrorRate() > 0.3) {
circuitBreakers.computeIfAbsent(model, this::createCircuitBreaker)
.openCircuit();
log.warn("模型 {} 错误率过高,触发熔断", model);
}
// 动态调整流量权重
double newWeight = calculateOptimalWeight(metric);
updateModelWeight(model, newWeight);
});
}
private double calculateOptimalWeight(ModelMetrics metric) {
// 基于响应时间、错误率、成本综合计算权重
double responseScore = Math.max(0, 1 - metric.getP95ResponseTime() / 30.0);
double errorScore = Math.max(0, 1 - metric.getErrorRate());
double costScore = Math.max(0, 1 - metric.getCostPerRequest() / 0.1);
return (responseScore * 0.4 + errorScore * 0.4 + costScore * 0.2);
}
@EventListener
public void handleAlert(AlertEvent event) {
switch (event.getAlertType()) {
case "HighErrorRate":
// 自动切换到备用模型
switchToFallbackModel(event.getModel());
break;
case "HighResponseTime":
// 降低该模型的流量权重
reduceModelWeight(event.getModel(), 0.5);
break;
case "CostSpike":
// 切换到成本更低的模型
switchToCostEffectiveModel();
break;
}
}
}
六、Grafana监控看板实战
6.1 综合监控看板配置
{
"dashboard": {
"title": "Spring AI应用监控看板",
"panels": [
{
"title": "请求量与错误率",
"type": "graph",
"targets": [
{
"expr": "rate(ai_requests_total[5m])",
"legendFormat": "总请求量"
},
{
"expr": "rate(ai_requests_errors_total[5m])",
"legendFormat": "错误量"
}
]
},
{
"title": "模型响应时间分布",
"type": "heatmap",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(ai_response_time_seconds_bucket[5m])) by (model)",
"legendFormat": "P95-{{model}}"
}
]
},
{
"title": "各模型Token消耗",
"type": "piechart",
"targets": [
{
"expr": "sum by (model) (ai_tokens_usage_sum)",
"legendFormat": "{{model}}"
}
]
},
{
"title": "实时成本估算",
"type": "stat",
"targets": [
{
"expr": "sum(ai_cost_estimated)",
"legendFormat": "总成本"
}
]
}
]
}
}
6.2 业务级监控面板
@RestController
@RequestMapping("/api/monitoring")
public class BusinessMonitoringController {
@GetMapping("/business-metrics")
public BusinessMetrics getBusinessMetrics(
@RequestParam(defaultValue = "24") int hours) {
Instant since = Instant.now().minus(hours, ChronoUnit.HOURS);
return BusinessMetrics.builder()
.successRate(calculateSuccessRate(since))
.avgResponseTime(calculateAvgResponseTime(since))
.totalCost(calculateTotalCost(since))
.topModels(getTopPerformingModels(since, 5))
.recentIssues(getRecentIssues(since))
.build();
}
@GetMapping("/model-comparison")
public ModelComparison compareModels(
@RequestParam String timeframe) {
return modelAnalyticsService.compareModels(
TimeRange.valueOf(timeframe.toUpperCase()));
}
}
七、生产环境最佳实践
7.1 监控数据采样策略
@Configuration
public class SamplingConfig {
@Bean
public Sampler adaptiveSampler() {
return new Sampler() {
@Override
public boolean isSampled(long traceId) {
// 重要业务100%采样,普通请求按比例采样
String businessType = getBusinessTypeFromContext();
return switch (businessType) {
case "payment", "auth" -> true; // 关键业务全采样
case "search", "recommend" -> Math.random() < 0.5; // 50%采样
default -> Math.random() < 0.1; // 10%采样
};
}
};
}
}
7.2 敏感信息脱敏处理
@Component
public class SensitiveDataFilter {
private final List<Pattern> sensitivePatterns = Arrays.asList(
Pattern.compile("\\b\\d{4}[ -]?\\d{4}[ -]?\\d{4}[ -]?\\d{4}\\b"), // 银行卡号
Pattern.compile("\\b\\d{17}[\\dXx]\\b"), // 身份证号
Pattern.compile("\\b1[3-9]\\d{9}\\b") // 手机号
);
public String filterSensitiveData(String text) {
if (text == null) return null;
String filtered = text;
for (Pattern pattern : sensitivePatterns) {
filtered = pattern.matcher(filtered).replaceAll("***");
}
return filtered;
}
}
总结
本文详细介绍了Spring AI应用的可观测性体系建设,涵盖指标监控、链路追踪、智能告警等关键环节。通过实施完整的监控体系,团队能够实时掌握AI应用运行状态,快速定位问题,优化性能成本,为业务稳定运行提供坚实保障。