Spring AI Alibaba 可观测性实践：AI应用监控与链路追踪

导读：AI 应用上生产，少不了完善的可观测体系。Token 消耗直接影响成本，响应延迟影响用户体验，一旦出现问题却找不到根因，那只能靠猜。本文基于 Micrometer + Prometheus + OpenTelemetry，构建覆盖指标、链路、日志的 AI 应用全链路可观测方案。

一、AI 应用可观测的特殊性

传统 Web 应用的监控主要关注 QPS、延迟、错误率这三个维度。AI 应用在此基础上新增了几个独特维度：

复制代码

传统 Web 应用监控：
    QPS / 延迟 / 错误率 / 资源使用率

AI 应用额外关注：
    Token 消耗（直接对应费用）
    模型响应质量（幻觉率、相关性）
    不同模型的使用分布
    Prompt/Response 内容审计
    工具调用成功率
    RAG 检索命中率

其中 Token 消耗是最独特的------它是直接的业务成本，必须精细化监控。

二、依赖配置

xml 复制代码

<!-- Spring Boot Actuator -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<!-- Micrometer + Prometheus -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<!-- OpenTelemetry 链路追踪 -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>

<!-- Spring AI 内置可观测支持 -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-autoconfigure-model-observability</artifactId>
</dependency>

三、Actuator 基础配置

yaml 复制代码

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus,loggers
  endpoint:
    health:
      show-details: always
      # 健康检查包含 AI 模型可用性
      show-components: always
    prometheus:
      enabled: true
  metrics:
    export:
      prometheus:
        enabled: true
    tags:
      # 全局标签：所有指标都携带应用名和环境
      application: ${spring.application.name}
      environment: ${spring.profiles.active:unknown}

# 链路追踪采样率（1.0 = 100% 采样，生产建议 0.1）
spring:
  ai:
    chat:
      # 开启 AI 调用的 Micrometer 指标收集
      observations:
        enabled: true
        # 是否在 Span 中记录 Prompt 内容（生产环境建议 false，防止隐私泄露）
        include-prompt: false
        # 是否记录模型完整响应
        include-completion: false

四、Token 消耗指标：成本监控的核心

4.1 Spring AI 内置的 Token 指标

Spring AI 1.1 版已内置了 Token 相关指标，通过 ChatModelMeterBinder 自动采集：

复制代码

# 核心指标名称（Micrometer 格式）

gen_ai.client.token.usage               # Token 消耗量
    tags:
        ai.operation.name = "chat"
        gen_ai.response.model = "qwen-turbo"
        gen_ai.token.type = "input"/"output"/"total"

gen_ai.client.operation.duration        # 模型调用耗时（秒）
    tags:
        ai.operation.name = "chat"
        gen_ai.response.model = "qwen-turbo"
        gen_ai.request.model = "qwen-turbo"

gen_ai.client.operation.error           # 错误次数
    tags:
        error.type = "TimeoutException"

访问 /actuator/prometheus 可以看到这些指标：

复制代码

# HELP gen_ai_client_token_usage_tokens
# TYPE gen_ai_client_token_usage_tokens counter
gen_ai_client_token_usage_tokens_total{
  ai_operation_name="chat",
  gen_ai_response_model="qwen-turbo",
  gen_ai_token_type="input"
} 15234.0

gen_ai_client_token_usage_tokens_total{
  ai_operation_name="chat",
  gen_ai_response_model="qwen-turbo",
  gen_ai_token_type="output"
} 8921.0

4.2 自定义成本计费指标

在内置指标的基础上，添加成本维度：

java 复制代码

@Component
@RequiredArgsConstructor
@Slf4j
public class AiCostMeterBinder implements MeterBinder {

    private final MeterRegistry registry;

    // 各模型 Token 单价（元/千Token）
    private static final Map<String, Double> INPUT_PRICE = Map.of(
            "qwen-turbo", 0.002,
            "qwen-plus", 0.004,
            "qwen-max", 0.04,
            "qwen-long", 0.0005,
            "gpt-4o-mini", 0.03
    );

    private static final Map<String, Double> OUTPUT_PRICE = Map.of(
            "qwen-turbo", 0.006,
            "qwen-plus", 0.012,
            "qwen-max", 0.12,
            "qwen-long", 0.002,
            "gpt-4o-mini", 0.09
    );

    // 累计成本（实际项目用 AtomicLong 存微元，避免浮点精度问题）
    private final Map<String, Counter> costCounters = new ConcurrentHashMap<>();

    @Override
    public void bindTo(@NonNull MeterRegistry registry) {
        // 注册成本 Gauge（每 30 秒刷新一次）
        Gauge.builder("ai.cost.estimate.yuan", this, AiCostMeterBinder::getTotalCost)
                .description("AI 调用估算总成本（元）")
                .register(registry);
    }

    /**
     * 记录一次模型调用的 Token 成本
     */
    public void recordUsage(String model, long inputTokens, long outputTokens) {
        double inputCost = inputTokens / 1000.0 *
                INPUT_PRICE.getOrDefault(model, 0.01);
        double outputCost = outputTokens / 1000.0 *
                OUTPUT_PRICE.getOrDefault(model, 0.02);

        Counter.builder("ai.cost.tokens")
                .tag("model", model)
                .tag("token_type", "input")
                .description("Token 输入消耗量")
                .register(registry)
                .increment(inputTokens);

        Counter.builder("ai.cost.tokens")
                .tag("model", model)
                .tag("token_type", "output")
                .register(registry)
                .increment(outputTokens);

        log.info("[Cost] 模型={}, 输入Token={}, 输出Token={}, 本次费用=¥{:.6f}",
                model, inputTokens, outputTokens, inputCost + outputCost);
    }

    private double getTotalCost() {
        // 实际从数据库或缓存读取累计成本
        return 0.0;
    }
}

五、链路追踪：Brave/OpenTelemetry 集成

5.1 OpenTelemetry 配置

yaml 复制代码

management:
  tracing:
    sampling:
      probability: 1.0  # 开发环境 100%，生产建议 0.1

otel:
  exporter:
    otlp:
      # 发送到 Jaeger / SkyWalking / Zipkin 等
      endpoint: http://localhost:4318
      protocol: http/protobuf
  traces:
    sampler:
      type: parentbased_traceidratio
      ratio: 0.1

5.2 TraceId 透传到 AI 调用

java 复制代码

@Slf4j
@Component
public class TraceAwareAdvisor implements RequestResponseAdvisor {

    @Override
    public AdvisedRequest adviseRequest(AdvisedRequest request,
                                         Map<String, Object> context) {
        // 获取当前链路的 TraceId，注入到上下文
        Span currentSpan = Span.current();
        if (currentSpan != null && currentSpan.getSpanContext().isValid()) {
            String traceId = currentSpan.getSpanContext().getTraceId();
            context.put("traceId", traceId);
            log.debug("[Trace] AI 调用，traceId={}", traceId);
        }
        return request;
    }

    @Override
    public ChatResponse adviseResponse(ChatResponse response,
                                        Map<String, Object> context) {
        // 在 Span 上添加 AI 调用的自定义属性
        Span currentSpan = Span.current();
        if (currentSpan != null && response.getMetadata() != null) {
            var usage = response.getMetadata().getUsage();
            if (usage != null) {
                // 在链路 Span 上记录 Token 消耗
                currentSpan.setAttribute("gen_ai.input_tokens", usage.getPromptTokens());
                currentSpan.setAttribute("gen_ai.output_tokens", usage.getGenerationTokens());
            }
        }
        return response;
    }

    @Override
    public int getOrder() {
        return Ordered.HIGHEST_PRECEDENCE;
    }
}

5.3 LLM 调用 Span 自定义标记

java 复制代码

@Service
@RequiredArgsConstructor
public class ObservableChatService {

    private final ChatClient chatClient;
    private final Tracer tracer;

    /**
     * 带链路追踪的 AI 调用
     */
    public String tracedChat(String message, String userId) {
        // 创建一个新的 Span，标记为 LLM 调用
        Span span = tracer.spanBuilder("llm.chat")
                .setAttribute("llm.vendor", "aliyun")
                .setAttribute("llm.model", "qwen-plus")
                .setAttribute("user.id", userId)
                // 生产环境：Prompt 内容脱敏后记录
                .setAttribute("llm.prompt.length", message.length())
                .startSpan();

        try (Scope scope = span.makeCurrent()) {
            String response = chatClient.prompt(message).call().content();
            span.setAttribute("llm.response.length", response.length());
            span.setStatus(StatusCode.OK);
            return response;
        } catch (Exception e) {
            span.recordException(e);
            span.setStatus(StatusCode.ERROR, e.getMessage());
            throw e;
        } finally {
            span.end();
        }
    }
}

六、结构化日志：Prompt 脱敏与安全过滤

java 复制代码

@Component
@Slf4j
public class StructuredLogAdvisor implements RequestResponseAdvisor {

    // 敏感信息正则（手机号、身份证、银行卡等）
    private static final Pattern PHONE_PATTERN =
            Pattern.compile("1[3-9]\\d{9}");
    private static final Pattern ID_CARD_PATTERN =
            Pattern.compile("\\d{17}[\\dXx]");
    private static final Pattern BANK_CARD_PATTERN =
            Pattern.compile("\\d{16,19}");
    private static final Pattern API_KEY_PATTERN =
            Pattern.compile("(?i)(api[-_]?key|token|secret)[\\s:=]+[\\w-]+");

    @Override
    public AdvisedRequest adviseRequest(AdvisedRequest request,
                                         Map<String, Object> context) {
        // 结构化日志记录请求信息（脱敏后）
        String userText = desensitize(request.userText());

        log.info("""
                [AI Request] model={} userPromptLength={} systemPromptLength={}
                userPromptPreview='{}'
                """,
                request.chatOptions() != null ?
                        request.chatOptions().getModel() : "default",
                userText.length(),
                request.systemText() != null ? request.systemText().length() : 0,
                truncate(userText, 100) // 只记录前100字
        );

        context.put("requestTime", System.currentTimeMillis());
        return request;
    }

    @Override
    public ChatResponse adviseResponse(ChatResponse response,
                                        Map<String, Object> context) {
        long duration = System.currentTimeMillis() -
                (long) context.getOrDefault("requestTime", System.currentTimeMillis());

        String content = response.getResult() != null ?
                response.getResult().getOutput().getContent() : "";

        log.info("""
                [AI Response] duration={}ms contentLength={} model={}
                inputTokens={} outputTokens={}
                """,
                duration,
                content != null ? content.length() : 0,
                response.getMetadata() != null ? response.getMetadata().getModel() : "unknown",
                response.getMetadata() != null && response.getMetadata().getUsage() != null ?
                        response.getMetadata().getUsage().getPromptTokens() : 0,
                response.getMetadata() != null && response.getMetadata().getUsage() != null ?
                        response.getMetadata().getUsage().getGenerationTokens() : 0
        );

        return response;
    }

    /**
     * 敏感信息脱敏
     */
    private String desensitize(String text) {
        if (text == null) return "";
        return text
                .replaceAll(PHONE_PATTERN.pattern(), "***PHONE***")
                .replaceAll(ID_CARD_PATTERN.pattern(), "***IDCARD***")
                .replaceAll(BANK_CARD_PATTERN.pattern(), "***BANKCARD***")
                .replaceAll(API_KEY_PATTERN.pattern(), "$1=***APIKEY***");
    }

    private String truncate(String text, int maxLength) {
        if (text == null) return "";
        return text.length() > maxLength ?
                text.substring(0, maxLength) + "..." : text;
    }

    @Override
    public int getOrder() {
        return Integer.MIN_VALUE; // 最高优先级
    }
}

七、健康检查：模型服务可用性探针

java 复制代码

@Component
@RequiredArgsConstructor
public class AiHealthIndicator implements HealthIndicator {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    @Override
    public Health health() {
        Health.Builder builder = new Health.Builder();

        // 检查 DashScope 模型可用性
        boolean modelHealthy = checkModelHealth();
        // 检查向量存储连通性
        boolean vectorStoreHealthy = checkVectorStoreHealth();

        builder.withDetail("dashscope", modelHealthy ? "UP" : "DOWN")
               .withDetail("vectorStore", vectorStoreHealthy ? "UP" : "DOWN");

        if (modelHealthy && vectorStoreHealthy) {
            return builder.up().build();
        } else if (modelHealthy) {
            // 向量存储故障不影响基础对话
            return builder.status("DEGRADED").build();
        } else {
            return builder.down().build();
        }
    }

    private boolean checkModelHealth() {
        try {
            String result = chatClient.prompt("ping")
                    .options(DashScopeChatOptions.builder()
                            .withModel("qwen-turbo")
                            .withMaxTokens(10)  // 最短响应，节省 Token
                            .build())
                    .call()
                    .content();
            return result != null && !result.isEmpty();
        } catch (Exception e) {
            log.warn("模型健康检查失败：{}", e.getMessage());
            return false;
        }
    }

    private boolean checkVectorStoreHealth() {
        try {
            // 执行一次最小代价的检索，验证连通性
            vectorStore.similaritySearch(
                    SearchRequest.defaults().withTopK(1));
            return true;
        } catch (Exception e) {
            log.warn("向量存储健康检查失败：{}", e.getMessage());
            return false;
        }
    }
}

八、Prometheus 告警规则

以下 Prometheus 告警规则覆盖了 AI 应用最常见的故障场景：

yaml 复制代码

# prometheus-rules.yml
groups:
  - name: ai-application-alerts
    rules:
      # 告警一：Token 消耗突增（可能是异常循环或攻击）
      - alert: AiTokenUsageSpike
        expr: |
          rate(gen_ai_client_token_usage_tokens_total[5m]) > 10000
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "AI Token 消耗速率异常"
          description: "过去5分钟 Token 消耗速率超过10000/min，请检查是否存在异常调用"

      # 告警二：AI 服务响应超时（P99 > 30秒）
      - alert: AiResponseLatencyHigh
        expr: |
          histogram_quantile(0.99,
            rate(gen_ai_client_operation_duration_seconds_bucket[5m])) > 30
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "AI 服务响应延迟过高"
          description: "AI 调用 P99 延迟超过30秒，可能影响用户体验"

      # 告警三：AI 服务错误率超阈值
      - alert: AiErrorRateHigh
        expr: |
          rate(gen_ai_client_operation_error_total[5m]) /
          rate(gen_ai_client_operation_duration_seconds_count[5m]) > 0.05
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "AI 服务错误率过高"
          description: "AI 调用错误率超过5%，请检查网络和 API Key 状态"

      # 告警四：日成本超预算
      - alert: AiDailyCostOverBudget
        expr: |
          increase(ai_cost_tokens_total[24h]) * 0.001 > 1000
        labels:
          severity: warning
        annotations:
          summary: "AI 日费用超预算"
          description: "今日 AI 调用费用估算已超过1000元，请检查是否存在异常"

九、Grafana 大盘核心面板

复制代码

推荐 Grafana 大盘布局（共4行）：

+------------------+------------------+------------------+
| 实时 QPS          | P50/P99 延迟     | 错误率           |
+------------------+------------------+------------------+
| Token 消耗/分钟   | 按模型分布饼图   | 今日估算费用     |
+------------------+------------------+------------------+
| 调用次数热力图    | 工具调用成功率   | RAG 命中率       |
+------------------+------------------+------------------+
| 最近10条错误日志  | 健康检查状态     | 活跃会话数       |
+------------------+------------------+------------------+

关键 PromQL 查询：

promql 复制代码

# QPS
rate(gen_ai_client_operation_duration_seconds_count[1m])

# P99 延迟
histogram_quantile(0.99, rate(gen_ai_client_operation_duration_seconds_bucket[5m]))

# Token 消耗率（按模型分组）
rate(gen_ai_client_token_usage_tokens_total[5m])

# 错误率
rate(gen_ai_client_operation_error_total[5m])
  / rate(gen_ai_client_operation_duration_seconds_count[5m])

十、总结

建立 AI 应用可观测体系的三个层次：

指标层（Micrometer + Prometheus）：Token 消耗、调用延迟、错误率、模型分布；
链路层（OpenTelemetry + Jaeger/Zipkin）：每次 AI 调用的完整调用链，快速定位慢查询；
日志层（结构化日志 + 脱敏）：Prompt/Response 脱敏后的 JSON 日志，支持查询分析。

三层缺一不可，共同构成 AI 应用的"运维大脑"。

下一篇将进入 Agent 开发领域，探讨如何用 Spring AI Alibaba 1.1 版实现 ReAct、Plan-And-Execute 等智能体架构。

参考资料

Spring AI Observability 文档

Micrometer 官方文档

OpenTelemetry Java SDK

Prometheus Alerting Rules