Spring AI深度解析(9/50):可观测性与监控体系实战

一、为什么AI应用需要专门的可观测性?

在生产环境中,AI应用面临独特的可观测性挑战:

  • 模型性能波动:同一模型在不同时间点的表现可能差异巨大

  • 成本不可控:Token消耗、API调用费用可能意外飙升

  • 响应时间不稳定:受网络、模型负载等多因素影响

  • 数据质量影响:输入数据质量直接影响输出结果可靠性

可观测性的三大支柱

  • 指标(Metrics):量化性能、成本、质量关键指标

  • 日志(Logging):记录详细执行过程,便于问题排查

  • 追踪(Tracing):端到端请求链路分析,定位瓶颈

二、Spring AI可观测性架构设计

2.1 整体监控架构

复制代码
graph TB
    A[用户请求] --> B[Spring AI应用]
    B --> C[指标收集]
    B --> D[日志记录]
    B --> E[链路追踪]
    C --> F[Prometheus]
    D --> G[ELK Stack]
    E --> H[Jaeger]
    F --> I[Grafana看板]
    G --> I
    H --> I
    I --> J[告警通知]

2.2 核心依赖配置

复制代码
<!-- pom.xml -->
<dependencies>
    <!-- Spring Boot Actuator -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    
    <!-- Micrometer Prometheus -->
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
    
    <!-- Sleuth分布式追踪 -->
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-sleuth</artifactId>
    </dependency>
    
    <!-- Zipkin链路追踪 -->
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-sleuth-zipkin</artifactId>
    </dependency>
</dependencies>

# application.yml - 可观测性配置
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  endpoint:
    metrics:
      enabled: true
    prometheus:
      enabled: true
  metrics:
    export:
      prometheus:
        enabled: true
    distribution:
      percentiles: [0.5, 0.95, 0.99]
      
spring:
  sleuth:
    sampler:
      probability: 1.0 # 全量采样,生产环境可调整
  zipkin:
    base-url: http://localhost:9411

三、指标监控体系实战

3.1 核心业务指标定义

复制代码
@Component
public class AiMetricsConfig {
    
    private final MeterRegistry meterRegistry;
    
    // 定义关键业务指标
    private final Counter requestCounter;
    private final Timer responseTimer;
    private final DistributionTokenDistribution;
    private final Gauge costGauge;
    
    public AiMetricsConfig(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        
        // 请求计数器
        this.requestCounter = Counter.builder("ai.requests.total")
            .description("AI请求总数")
            .tag("type", "chat")
            .register(meterRegistry);
            
        // 响应时间计时器
        this.responseTimer = Timer.builder("ai.response.time")
            .description("AI请求响应时间")
            .publishPercentiles(0.5, 0.95, 0.99)
            .register(meterRegistry);
            
        // Token分布统计
        this.tokenDistribution = DistributionSummary.builder("ai.tokens.usage")
            .description("Token使用量分布")
            .baseUnit("tokens")
            .register(meterRegistry);
    }
    
    // 记录模型调用指标
    public void recordModelInvocation(String modelName, long durationMs, 
                                    int promptTokens, int completionTokens) {
        // 记录基础指标
        requestCounter.increment();
        responseTimer.record(durationMs, TimeUnit.MILLISECONDS);
        tokenDistribution.record(promptTokens + completionTokens);
        
        // 记录模型特定指标
        Counter.builder("ai.requests.by.model")
            .tag("model", modelName)
            .register(meterRegistry)
            .increment();
            
        // 记录成本指标(估算)
        double cost = calculateCost(modelName, promptTokens, completionTokens);
        Metrics.gauge("ai.cost.estimated", 
            Tags.of("model", modelName), 
            cost);
    }
    
    private double calculateCost(String modelName, int promptTokens, int completionTokens) {
        // 国内模型定价(元/千Token)
        Map<String, Double> pricing = Map.of(
            "qwen-max", 0.12,
            "doubao-pro", 0.08,
            "deepseek-chat", 0.06,
            "glm-4", 0.10
        );
        
        double rate = pricing.getOrDefault(modelName, 0.1);
        return (promptTokens + completionTokens) * rate / 1000;
    }
}

3.2 模型性能监控切面

复制代码
@Aspect
@Component
public class ModelMonitoringAspect {
    
    private final AiMetricsConfig metricsConfig;
    
    @Around("@within(org.springframework.ai.chat.ChatClient) || " +
           "execution(* org.springframework.ai.chat.ChatClient+.*(..))")
    public Object monitorModelCalls(ProceedingJoinPoint joinPoint) throws Throwable {
        String modelName = extractModelName(joinPoint);
        long startTime = System.currentTimeMillis();
        
        try {
            Object result = joinPoint.proceed();
            
            // 记录成功指标
            long duration = System.currentTimeMillis() - startTime;
            recordSuccessMetrics(modelName, duration, result);
            
            return result;
            
        } catch (Exception e) {
            // 记录失败指标
            recordErrorMetrics(modelName, e);
            throw e;
        }
    }
    
    private void recordSuccessMetrics(String modelName, long duration, Object result) {
        if (result instanceof ChatResponse response) {
            int promptTokens = extractPromptTokens(response);
            int completionTokens = extractCompletionTokens(response);
            
            metricsConfig.recordModelInvocation(
                modelName, duration, promptTokens, completionTokens
            );
        }
    }
    
    private void recordErrorMetrics(String modelName, Exception e) {
        Counter.builder("ai.requests.errors")
            .tag("model", modelName)
            .tag("error_type", e.getClass().getSimpleName())
            .register(Metrics.globalRegistry)
            .increment();
    }
}

四、分布式链路追踪实战

4.1 请求链路追踪配置

复制代码
@Configuration
public class TracingConfig {
    
    @Bean
    public CurrentTraceContext currentTraceContext() {
        return ThreadLocalCurrentTraceContext.newBuilder()
            .build();
    }
    
    @Bean
    public Sampler alwaysSampler() {
        return Sampler.ALWAYS_SAMPLE;
    }
}

// 自定义追踪器
@Component
public class AiRequestTracer {
    
    private final Tracer tracer;
    
    public AiRequestTracer(Tracer tracer) {
        this.tracer = tracer;
    }
    
    public Span startAiRequestSpan(String operation, String model) {
        return tracer.nextSpan()
            .name(operation)
            .tag("ai.model", model)
            .tag("ai.operation", operation)
            .start();
    }
    
    public void recordTokenUsage(Span span, int promptTokens, int completionTokens) {
        span.tag("ai.prompt_tokens", String.valueOf(promptTokens));
        span.tag("ai.completion_tokens", String.valueOf(completionTokens));
        span.tag("ai.total_tokens", 
            String.valueOf(promptTokens + completionTokens));
    }
    
    public void recordModelResponse(Span span, String responseSummary) {
        span.tag("ai.response_summary", 
            responseSummary.substring(0, Math.min(100, responseSummary.length())));
        span.event("ai.response.completed");
    }
}

4.2 链路追踪集成示例

复制代码
@RestController
@RequestMapping("/api/chat")
public class ChatController {
    
    private final ChatClient chatClient;
    private final AiRequestTracer aiTracer;
    
    @PostMapping
    public ResponseEntity<ChatResponse> chat(
            @RequestBody ChatRequest request,
            @RequestHeader(value = "X-User-Id", required = false) String userId) {
        
        Span span = aiTracer.startAiRequestSpan("chat_completion", "qwen-max");
        
        try (SpanInScope ws = tracer.withSpanInScope(span)) {
            // 记录请求信息
            span.tag("user.id", userId != null ? userId : "anonymous");
            span.tag("request.length", String.valueOf(request.getMessage().length()));
            
            // 执行AI调用
            ChatResponse response = chatClient.call(request.getMessage());
            
            // 记录响应信息
            aiTracer.recordTokenUsage(span, 
                extractPromptTokens(response),
                extractCompletionTokens(response));
            aiTracer.recordModelResponse(span, 
                response.getResult().getOutput().getContent());
                
            return ResponseEntity.ok(response);
            
        } catch (Exception e) {
            span.error(e);
            span.tag("error", "true");
            throw e;
        } finally {
            span.finish();
        }
    }
}

五、智能告警与自适应调控

5.1 多维度告警规则

复制代码
# alert-rules.yml
groups:
- name: ai-monitoring
  rules:
  - alert: HighErrorRate
    expr: rate(ai_requests_errors_total[5m]) / rate(ai_requests_total[5m]) > 0.1
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "AI服务错误率过高"
      description: "错误率超过10%,当前值: {{ $value }}"
      
  - alert: HighResponseTime
    expr: histogram_quantile(0.95, rate(ai_response_time_seconds_bucket[5m])) > 10
    for: 3m
    labels:
      severity: critical
    annotations:
      summary: "AI服务响应时间过长"
      description: "P95响应时间超过10秒,当前值: {{ $value }}s"
      
  - alert: CostSpike
    expr: rate(ai_cost_estimated[1h]) > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "AI服务成本异常飙升"
      description: "小时成本超过100元,当前值: {{ $value }}元"
      
  - alert: ModelPerformanceDegradation
    expr: ai_requests_success_rate < 0.8
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "模型性能严重下降"
      description: "成功率低于80%,当前值: {{ $value }}"

5.2 自适应调控策略

复制代码
@Service
public class AdaptiveControlService {
    
    private final MeterRegistry meterRegistry;
    private final Map<String, CircuitBreaker> circuitBreakers = new ConcurrentHashMap<>();
    
    @Scheduled(fixedRate = 30000) // 每30秒检查一次
    public void adjustModelRouting() {
        // 获取各模型实时指标
        Map<String, ModelMetrics> metrics = getRealTimeMetrics();
        
        metrics.forEach((model, metric) -> {
            // 如果错误率过高,触发熔断
            if (metric.getErrorRate() > 0.3) {
                circuitBreakers.computeIfAbsent(model, this::createCircuitBreaker)
                    .openCircuit();
                log.warn("模型 {} 错误率过高,触发熔断", model);
            }
            
            // 动态调整流量权重
            double newWeight = calculateOptimalWeight(metric);
            updateModelWeight(model, newWeight);
        });
    }
    
    private double calculateOptimalWeight(ModelMetrics metric) {
        // 基于响应时间、错误率、成本综合计算权重
        double responseScore = Math.max(0, 1 - metric.getP95ResponseTime() / 30.0);
        double errorScore = Math.max(0, 1 - metric.getErrorRate());
        double costScore = Math.max(0, 1 - metric.getCostPerRequest() / 0.1);
        
        return (responseScore * 0.4 + errorScore * 0.4 + costScore * 0.2);
    }
    
    @EventListener
    public void handleAlert(AlertEvent event) {
        switch (event.getAlertType()) {
            case "HighErrorRate":
                // 自动切换到备用模型
                switchToFallbackModel(event.getModel());
                break;
            case "HighResponseTime":
                // 降低该模型的流量权重
                reduceModelWeight(event.getModel(), 0.5);
                break;
            case "CostSpike":
                // 切换到成本更低的模型
                switchToCostEffectiveModel();
                break;
        }
    }
}

六、Grafana监控看板实战

6.1 综合监控看板配置

复制代码
{
  "dashboard": {
    "title": "Spring AI应用监控看板",
    "panels": [
      {
        "title": "请求量与错误率",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(ai_requests_total[5m])",
            "legendFormat": "总请求量"
          },
          {
            "expr": "rate(ai_requests_errors_total[5m])",
            "legendFormat": "错误量"
          }
        ]
      },
      {
        "title": "模型响应时间分布",
        "type": "heatmap",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(ai_response_time_seconds_bucket[5m])) by (model)",
            "legendFormat": "P95-{{model}}"
          }
        ]
      },
      {
        "title": "各模型Token消耗",
        "type": "piechart",
        "targets": [
          {
            "expr": "sum by (model) (ai_tokens_usage_sum)",
            "legendFormat": "{{model}}"
          }
        ]
      },
      {
        "title": "实时成本估算",
        "type": "stat",
        "targets": [
          {
            "expr": "sum(ai_cost_estimated)",
            "legendFormat": "总成本"
          }
        ]
      }
    ]
  }
}

6.2 业务级监控面板

复制代码
@RestController
@RequestMapping("/api/monitoring")
public class BusinessMonitoringController {
    
    @GetMapping("/business-metrics")
    public BusinessMetrics getBusinessMetrics(
            @RequestParam(defaultValue = "24") int hours) {
        
        Instant since = Instant.now().minus(hours, ChronoUnit.HOURS);
        
        return BusinessMetrics.builder()
            .successRate(calculateSuccessRate(since))
            .avgResponseTime(calculateAvgResponseTime(since))
            .totalCost(calculateTotalCost(since))
            .topModels(getTopPerformingModels(since, 5))
            .recentIssues(getRecentIssues(since))
            .build();
    }
    
    @GetMapping("/model-comparison")
    public ModelComparison compareModels(
            @RequestParam String timeframe) {
        
        return modelAnalyticsService.compareModels(
            TimeRange.valueOf(timeframe.toUpperCase()));
    }
}

七、生产环境最佳实践

7.1 监控数据采样策略

复制代码
@Configuration
public class SamplingConfig {
    
    @Bean
    public Sampler adaptiveSampler() {
        return new Sampler() {
            @Override
            public boolean isSampled(long traceId) {
                // 重要业务100%采样,普通请求按比例采样
                String businessType = getBusinessTypeFromContext();
                
                return switch (businessType) {
                    case "payment", "auth" -> true; // 关键业务全采样
                    case "search", "recommend" -> Math.random() < 0.5; // 50%采样
                    default -> Math.random() < 0.1; // 10%采样
                };
            }
        };
    }
}

7.2 敏感信息脱敏处理

复制代码
@Component
public class SensitiveDataFilter {
    
    private final List<Pattern> sensitivePatterns = Arrays.asList(
        Pattern.compile("\\b\\d{4}[ -]?\\d{4}[ -]?\\d{4}[ -]?\\d{4}\\b"), // 银行卡号
        Pattern.compile("\\b\\d{17}[\\dXx]\\b"), // 身份证号
        Pattern.compile("\\b1[3-9]\\d{9}\\b") // 手机号
    );
    
    public String filterSensitiveData(String text) {
        if (text == null) return null;
        
        String filtered = text;
        for (Pattern pattern : sensitivePatterns) {
            filtered = pattern.matcher(filtered).replaceAll("***");
        }
        return filtered;
    }
}

总结

本文详细介绍了Spring AI应用的可观测性体系建设,涵盖指标监控、链路追踪、智能告警等关键环节。通过实施完整的监控体系,团队能够实时掌握AI应用运行状态,快速定位问题,优化性能成本,为业务稳定运行提供坚实保障。

相关推荐
罗西的思考2 小时前
【Agent】MemOS 源码笔记---(5)---记忆分类
人工智能·深度学习·算法
java1234_小锋2 小时前
Spring IoC的实现机制是什么?
java·后端·spring
dajun1811234563 小时前
反 AI 生成技术兴起:如何识别与过滤海量的 AI 伪造内容?
人工智能
人邮异步社区3 小时前
PRML为何是机器学习的经典书籍中的经典?
人工智能·机器学习
xqqxqxxq3 小时前
背单词软件技术笔记(V2.0扩展版)
java·笔记·python
paceboy3 小时前
Claude和Cursor之间的切换
人工智能·程序人生
GISer_Jing3 小时前
AI营销增长:4大核心能力+前端落地指南
前端·javascript·人工智能
驴友花雕3 小时前
【花雕动手做】CanMV K230 AI视觉识别模块之使用CanMV IDE调试运行人脸代码
ide·人工智能·单片机·嵌入式硬件·canmv k230 ai视觉·canmv ide 人脸代码
猫头虎3 小时前
又又又双叒叕一款AI IDE发布,国内第五款国产AI IDE Qoder来了
ide·人工智能·langchain·prompt·aigc·intellij-idea·ai编程