SpringAI 2.0 可观测性体系:AI 操作追踪、指标监控与评估框架

SpringAI 2.0 可观测性体系:AI 操作追踪、指标监控与评估框架

摘要:在企业级 AI 应用落地过程中,可观测性是保障系统稳定性、控制成本、评估质量的核心基础设施。Spring AI 2.0 提供了完整的可观测性体系,涵盖分布式追踪、指标监控、Token 消耗分析、Advisor 链执行剖析,以及基于 LLM 的响应质量评估(EVAL)框架。本文将从资深架构师视角,深入剖析这些能力的实现原理与实战用法。

一、为什么 AI 应用需要可观测性?

传统 Java 应用的监控体系(APM)已经非常成熟,但 AI 应用带来了新的挑战:

维度 传统微服务 AI 应用
调用延迟 几十毫秒级别 数百毫秒到数秒
成本模型 相对固定 按 Token 计费,波动大
质量评估 确定性结果 需要语义评估,存在幻觉风险
执行链路 同步调用 Advisor 链式调用,复杂度高

一个生产级的 AI 应用,需要回答以下问题:

  • 当前 AI 调用的延迟分布如何?P99 是多少?
  • 每次对话消耗了多少 Token?成本是多少?
  • AI 返回的答案是否可靠?是否存在幻觉?
  • Advisor 链中哪个环节是性能瓶颈?

Spring AI 2.0 的可观测性体系正是为解决这些问题而设计。

二、Spring AI 可观测性整体架构

复制代码
+------------------------------------------------------------------+
|                        Spring AI 2.0                              |
+------------------------------------------------------------------+
|  +----------------+  +----------------+  +--------------------+  |
|  |   Metrics      |  |   Tracing      |  |    EVAL            |  |
|  |  (Micrometer)  |  | (Zipkin/Jaeger)|  | (Quality Assess)   |  |
|  +--------+-------+  +--------+-------+  +---------+---------+  |
|           |                    |                    |           |
|           v                    v                    v           |
|  +--------------------------------------------------+           |
|  |              ObservationRegistry                |           |
|  +--------------------------------------------------+           |
+------------------------------------------------------------------+
           |                    |                    |
           v                    v                    v
     [Prometheus]          [Zipkin]            [LLM Evaluator]
     [Grafana]             [Jaeger]

Spring AI 基于 Micrometer 的 ObservationRegistry 统一管理可观测性,所有监控数据通过标准的 Observation 接口输出,支持多种后端。

三、Token 消耗监控:企业成本控制的基础

3.1 JTokkit Token 计数

Spring AI 2.0 内置了 JTokkit Token 计数估算器,这是目前最准确的 Token 估算库之一。

java 复制代码
import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.UserMessage;

// 初始化 Token 计数估算器
JTokkitTokenCountEstimator estimator = new JTokkitTokenCountEstimator();

// 计算单条消息的 Token
UserMessage userMessage = new UserMessage("请详细解释Spring Boot的自动配置原理");
int userTokens = estimator.estimate(userMessage);
System.out.println("用户消息 Token 数: " + userTokens);

// 计算 Assistant 消息的 Token
String assistantResponse = "Spring Boot 自动配置的核心原理是...";
int assistantTokens = estimator.estimate(assistantResponse);
System.out.println("Assistant 响应 Token 数: " + assistantTokens);

// 计算完整对话的 Token(包含消息历史)
List<org.springframework.ai.chat.messages.Message> messages = List.of(
    new UserMessage("什么是依赖注入?"),
    new AssistantMessage("依赖注入是一种设计模式..."),
    new UserMessage("请详细说明"),
    new AssistantMessage("详细说明如下...")
);
int totalTokens = estimator.estimate(messages);
System.out.println("对话总 Token 数: " + totalTokens);

3.2 Content 接口的统一 Token 计算

Spring AI 的 Content 接口统一了消息和文档的 Token 计算方式:

java 复制代码
import org.springframework.ai.content.Content;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.chat.prompt.UserPromptTemplate;

// 自定义文档内容实现 Content 接口
public class MyDocument implements Content {
    private String text;
    private Map<String, Object> metadata;
    
    // getters and setters
    
    @Override
    public String getText() {
        return this.text;
    }
    
    @Override
    public Map<String, Object> getMetadata() {
        return this.metadata;
    }
}

// 在 RAG 场景中计算检索文档的总 Token
public class TokenBudgetManager {
    private final JTokkitTokenCountEstimator estimator;
    private final int maxTokens;
    
    public TokenBudgetManager(int maxTokens) {
        this.estimator = new JTokkitTokenCountEstimator();
        this.maxTokens = maxTokens;
    }
    
    public boolean canFitInContext(List<Content> documents) {
        int totalTokens = documents.stream()
            .mapToInt(doc -> estimator.estimate(doc.getText()))
            .sum();
        return totalTokens <= maxTokens;
    }
    
    public List<Content> fitDocuments(List<Content> documents) {
        List<Content> fitting = new ArrayList<>();
        int currentTokens = 0;
        
        for (Content doc : documents) {
            int docTokens = estimator.estimate(doc.getText());
            if (currentTokens + docTokens <= maxTokens) {
                fitting.add(doc);
                currentTokens += docTokens;
            } else {
                break; // 超出预算,不再添加
            }
        }
        return fitting;
    }
}

四、Micrometer 指标埋点:延迟、Token、成本全景监控

4.1 Spring AI 指标自动配置

Spring AI 2.0 依赖 spring-boot-starter-actuator 和 Micrometer,自动暴露 AI 相关的指标:

xml 复制代码
<!-- pom.xml 依赖配置 -->
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-spring-boot-starter</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
</dependencies>
yaml 复制代码
# application.yml 配置
management:
  endpoints:
    web:
      exposure:
        include: prometheus,metrics,health
  metrics:
    tags:
      application: ${spring.application.name}
    export:
      prometheus:
        enabled: true
  observations:
    enabled: true

4.2 自定义 AI 指标埋点

在企业级场景中,我们需要自定义更细粒度的指标:

java 复制代码
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.Meter;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.MessageAggregator;
import org.springframework.stereotype.Component;
import reactor.core.publisher.Flux;

import java.time.Duration;
import java.util.concurrent.atomic.AtomicLong;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Component
public class AIMetricsCollector {
    
    private final MeterRegistry registry;
    private final Map<String, AtomicLong> tokenUsageCache = new ConcurrentHashMap<>();
    private final Map<String, Timer> callTimers = new ConcurrentHashMap<>();
    private final Map<String, Counter> errorCounters = new ConcurrentHashMap<>();
    
    // Token 价格配置 (单位: 每百万 Token)
    private static final Map<String, Double> MODEL_PRICING = Map.of(
        "gpt-4o", 5.0,    // 输入
        "gpt-4o", 15.0,   // 输出
        "gpt-4o-mini", 0.15,
        "gpt-4o-mini", 0.6,
        "claude-3-5-sonnet", 3.0,
        "claude-3-5-sonnet", 15.0
    );
    
    public AIMetricsCollector(MeterRegistry registry) {
        this.registry = registry;
        initializeMetrics();
    }
    
    private void initializeMetrics() {
        // 注册 Token 使用量 Gauge
        Gauge.builder("ai.token.usage.total", tokenUsageCache, 
            map -> map.values().stream()
                .mapToLong(AtomicLong::get)
                .sum())
            .tag("type", "all")
            .description("Total tokens used")
            .register(registry);
    }
    
    /**
     * 记录 AI 调用的完整指标
     */
    public void recordCall(String model, String operation, 
                          int promptTokens, int completionTokens,
                          long latencyMs, boolean success) {
        
        String modelKey = model.replace(":", "_");
        
        // 1. 延迟指标
        Timer timer = callTimers.computeIfAbsent(operation + "_" + modelKey, k -> 
            Timer.builder("ai.call.latency")
                .tag("model", model)
                .tag("operation", operation)
                .description("AI call latency")
                .register(registry));
        
        timer.record(Duration.ofMillis(latencyMs));
        
        // 2. Token 消耗指标
        if (promptTokens > 0) {
            Counter.builder("ai.token.prompt")
                .tag("model", model)
                .tag("operation", operation)
                .register(registry)
                .increment(promptTokens);
        }
        
        if (completionTokens > 0) {
            Counter.builder("ai.token.completion")
                .tag("model", model)
                .tag("operation", operation)
                .register(registry)
                .increment(completionTokens);
        }
        
        // 3. 成本指标 (USD)
        double cost = calculateCost(model, promptTokens, completionTokens);
        Counter.builder("ai.cost.usd")
            .tag("model", model)
            .tag("operation", operation)
            .register(registry)
            .increment(cost);
        
        // 4. 成功/失败计数
        if (success) {
            Counter.builder("ai.call.count")
                .tag("model", model)
                .tag("operation", operation)
                .tag("status", "success")
                .register(registry)
                .increment();
        } else {
            Counter errorCounter = errorCounters.computeIfAbsent(
                operation + "_" + modelKey, k -> 
                Counter.builder("ai.call.errors")
                    .tag("model", model)
                    .tag("operation", operation)
                    .description("AI call errors")
                    .register(registry));
            errorCounter.increment();
        }
    }
    
    private double calculateCost(String model, int promptTokens, int completionTokens) {
        // 简化计算:实际生产应查询 MODEL_PRICING
        double inputPrice = 5.0 / 1_000_000;  // $5 per 1M input
        double outputPrice = 15.0 / 1_000_000; // $15 per 1M output
        
        return (promptTokens * inputPrice) + (completionTokens * outputPrice);
    }
    
    /**
     * 包装 ChatModel 以自动收集指标
     */
    @Component
    public class InstrumentedChatModel implements ChatModel {
        
        private final ChatModel delegate;
        private final AIMetricsCollector metrics;
        
        public InstrumentedChatModel(ChatModel delegate, AIMetricsCollector metrics) {
            this.delegate = delegate;
            this.metrics = metrics;
        }
        
        @Override
        public ChatResponse call(Prompt prompt) {
            long startTime = System.currentTimeMillis();
            boolean success = false;
            int promptTokens = 0;
            int completionTokens = 0;
            
            try {
                ChatResponse response = delegate.call(prompt);
                success = true;
                
                // 尝试从响应中提取 Token 用量
                // 实际实现需要根据不同模型适配
                return response;
            } catch (Exception e) {
                throw e;
            } finally {
                long latency = System.currentTimeMillis() - startTime;
                metrics.recordCall(
                    getDefaultModel(),
                    "sync_call",
                    promptTokens,
                    completionTokens,
                    latency,
                    success
                );
            }
        }
        
        @Override
        public Flux<ChatResponse> stream(Prompt prompt) {
            long startTime = System.currentTimeMillis();
            
            return delegate.stream(prompt)
                .doFinally(signalType -> {
                    long latency = System.currentTimeMillis() - startTime;
                    metrics.recordCall(
                        getDefaultModel(),
                        "stream_call",
                        0, 0,  // 流式调用难以精确统计 Token
                        latency,
                        signalType != null
                    );
                });
        }
        
        private String getDefaultModel() {
            // 从配置中获取默认模型名称
            return "gpt-4o";
        }
    }
}

4.3 Grafana 看板配置

以下是生产级 Grafana 看板的 JSON 配置核心部分:

json 复制代码
{
  "panels": [
    {
      "title": "AI 调用延迟分布",
      "type": "timeseries",
      "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
      "targets": [
        {
          "expr": "histogram_quantile(0.50, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
          "legendFormat": "P50 - {{model}}"
        },
        {
          "expr": "histogram_quantile(0.95, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
          "legendFormat": "P95 - {{model}}"
        },
        {
          "expr": "histogram_quantile(0.99, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
          "legendFormat": "P99 - {{model}}"
        }
      ]
    },
    {
      "title": "Token 消耗趋势",
      "type": "timeseries",
      "gridPos": {"x": 12, "y": 0, "w": 12, "h": 8},
      "targets": [
        {
          "expr": "sum(increase(ai_token_prompt_total[1h])) by (model)",
          "legendFormat": "Input - {{model}}"
        },
        {
          "expr": "sum(increase(ai_token_completion_total[1h])) by (model)",
          "legendFormat": "Output - {{model}}"
        }
      ]
    },
    {
      "title": "AI 成本实时监控",
      "type": "stat",
      "gridPos": {"x": 0, "y": 8, "w": 6, "h": 4},
      "targets": [
        {
          "expr": "sum(ai_cost_usd_total{service=\"chat-service\"})",
          "legendFormat": "Total Cost"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "currencyUSD",
          "thresholds": {
            "steps": [
              {"value": 0, "color": "green"},
              {"value": 100, "color": "yellow"},
              {"value": 500, "color": "red"}
            ]
          }
        }
      }
    },
    {
      "title": "调用成功率",
      "type": "gauge",
      "gridPos": {"x": 6, "y": 8, "w": 6, "h": 4},
      "targets": [
        {
          "expr": "sum(rate(ai_call_count_total{service=\"chat-service\",status=\"success\"}[5m])) / sum(rate(ai_call_count_total{service=\"chat-service\"}[5m])) * 100"
        }
      ]
    },
    {
      "title": "模型调用占比",
      "type": "piechart",
      "gridPos": {"x": 12, "y": 8, "w": 6, "h": 4},
      "targets": [
        {
          "expr": "sum(increase(ai_call_count_total{service=\"chat-service\"}[1h])) by (model)"
        }
      ]
    }
  ]
}

五、分布式追踪:Zipkin/Jaeger 集成

5.1 Spring AI 追踪自动配置

Spring AI 2.0 与 Spring Cloud Sleuth/Zipkin 无缝集成:

xml 复制代码
<dependencies>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-sleuth-zipkin</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-spring-boot-starter</artifactId>
    </dependency>
</dependencies>
yaml 复制代码
spring:
  application:
    name: chat-service
  sleuth:
    sampler:
      probability: 1.0  # 生产环境建议 0.1-0.3
  zipkin:
    base-url: http://localhost:9411
    sender:
      type: web

5.2 自定义追踪上下文

在复杂的企业场景中,我们需要将业务上下文传递到 AI 追踪中:

java 复制代码
import org.springframework.cloud.sleuth.BaggageInScope;
import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.MessageAggregator;
import reactor.core.publisher.Flux;

import java.util.Map;
import java.util.HashMap;

@Component
public class AITracingService {
    
    private final Tracer tracer;
    private final ChatModel chatModel;
    
    public AITracingService(Tracer tracer, ChatModel chatModel) {
        this.tracer = tracer;
        this.chatModel = chatModel;
    }
    
    /**
     * 带追踪的 AI 调用
     */
    public ChatResponse callWithTrace(Prompt prompt, Map<String, String> businessContext) {
        
        Span span = tracer.startScopedSpan("ai.chat.call");
        
        try {
            // 添加业务标签
            span.tag("ai.model", "gpt-4o");
            span.tag("ai.operation", "chat");
            businessContext.forEach(span::tag);
            
            // 添加 baggage (跨服务传递)
            span.addEvent("start_chat_call");
            
            long startTime = System.currentTimeMillis();
            ChatResponse response = chatModel.call(prompt);
            long callDuration = System.currentTimeMillis() - startTime;
            
            // 记录耗时
            span.event("chat_completed");
            span.tag("ai.duration.ms", String.valueOf(callDuration));
            
            // 记录 Token 用量(如果有)
            // span.tag("ai.tokens.prompt", String.valueOf(promptTokens));
            // span.tag("ai.tokens.completion", String.valueOf(completionTokens));
            
            return response;
            
        } catch (Exception e) {
            span.error(e);
            span.tag("ai.error", e.getClass().getSimpleName());
            throw e;
        } finally {
            span.end();
        }
    }
    
    /**
     * 带追踪的流式调用
     */
    public Flux<ChatResponse> streamWithTrace(Prompt prompt) {
        
        Span span = tracer.startScopedSpan("ai.chat.stream");
        span.tag("ai.model", "gpt-4o");
        span.tag("ai.operation", "stream");
        
        return chatModel.stream(prompt)
            .doOnComplete(() -> span.event("stream_completed"))
            .doOnError(e -> span.error(e))
            .doFinally(signal -> span.end());
    }
    
    /**
     * 跨服务追踪:在调用外部 AI 服务时传递 traceId
     */
    public Map<String, String> getTracingHeaders() {
        Map<String, String> headers = new HashMap<>();
        
        // 获取当前 trace 和 span 信息
        Span currentSpan = tracer.currentSpan();
        if (currentSpan != null) {
            headers.put("X-B3-TraceId", currentSpan.context().traceId());
            headers.put("X-B3-SpanId", currentSpan.context().spanId());
            headers.put("X-B3-ParentSpanId", currentSpan.context().parentId());
        }
        
        // 获取 baggage
        try (BaggageInScope baggage = tracer.getBaggage("user-id")) {
            if (baggage != null) {
                headers.put("X-User-Id", baggage.getValue());
            }
        }
        
        return headers;
    }
}

5.3 Advisor 链追踪

Advisor 链是 Spring AI 2.0 的核心特性,每个 Advisor 的执行都可以被追踪:

java 复制代码
import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.chat.client.advisor.RetrievalAugmentationAdvisor;
import org.springframework.ai.chat.model.MessageAggregator;
import org.springframework.ai.vectorstore.VectorStore;
import reactor.core.publisher.Flux;

/**
 * 带追踪的 Advisor 链配置
 */
@Configuration
public class TracedAdvisorConfiguration {
    
    @Bean
    public Advisor tracedChatMemoryAdvisor(ChatMemory chatMemory) {
        // 包装原始 Advisor 以添加追踪
        return new TracingAdvisor(
            MessageChatMemoryAdvisor.builder(chatMemory)
                .build(),
            "chat_memory"
        );
    }
    
    @Bean
    public Advisor tracedQuestionAnswerAdvisor(VectorStore vectorStore) {
        return new TracingAdvisor(
            QuestionAnswerAdvisor.builder(vectorStore)
                .build(),
            "question_answer"
        );
    }
    
    /**
     * 追踪装饰器
     */
    static class TracingAdvisor implements Advisor {
        
        private final Advisor delegate;
        private final String name;
        
        public TracingAdvisor(Advisor delegate, String name) {
            this.delegate = delegate;
            this.name = name;
        }
        
        @Override
        public ChatResponse advise(ChatContext context, ChatResponse response) {
            long startTime = System.currentTimeMillis();
            Span span = tracer.startScopedSpan("advisor." + name);
            
            try {
                span.tag("advisor.type", name);
                ChatResponse result = delegate.advise(context, response);
                span.tag("advisor.execution.success", "true");
                return result;
            } catch (Exception e) {
                span.error(e);
                throw e;
            } finally {
                span.tag("advisor.duration.ms", 
                    String.valueOf(System.currentTimeMillis() - startTime));
                span.end();
            }
        }
        
        // delegate other methods to delegate...
    }
}

六、Advisor 链执行耗时剖析与瓶颈诊断

6.1 Advisor 链执行原理

复制代码
+------------------------------------------------------------------+
|                     User Request                                   |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [1] SafeGuardAdvisor (内容安全检查)           ~5ms               |
|      - 敏感词检测                                                      |
|      - 安全策略拦截                                                   |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [2] ReReading Advisor (Re2)                      ~10ms          |
|      - 提示词增强                                                     |
|      - 指令强化                                                       |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [3] ChatMemoryAdvisor (对话历史管理)           ~2ms             |
|      - 历史消息加载                                                   |
|      - Token 预算控制                                                 |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [4] QuestionAnswerAdvisor (RAG 检索增强)        ~100-500ms      |
|      - 向量检索                                                       |
|      - 上下文组装                                                     |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [5] ToolCallAdvisor (工具调用管理)              ~50-2000ms       |
|      - 函数选择                                                       |
|      - 执行调用                                                       |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [6] LastMaxTokenSizeContentPurger              ~1ms              |
|      - Token 上限管理                                                |
|      - 上下文裁剪                                                     |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [7] ChatModel (LLM 调用)                     ~500-3000ms        |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|                     Response                                       |
+------------------------------------------------------------------+

6.2 Advisor 链性能监控

java 复制代码
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.*;

import java.util.*;
import java.util.concurrent.ConcurrentHashMap;

/**
 * Advisor 链执行耗时监控
 */
@Component
public class AdvisorChainProfiler {
    
    private final MeterRegistry registry;
    private final Map<String, Timer> advisorTimers = new ConcurrentHashMap<>();
    
    public AdvisorChainProfiler(MeterRegistry registry) {
        this.registry = registry;
    }
    
    /**
     * 监控单个 Advisor 的执行时间
     */
    public void recordAdvisorExecution(String advisorName, long durationMs) {
        Timer timer = advisorTimers.computeIfAbsent(advisorName, name -> 
            Timer.builder("advisor.execution.time")
                .tag("advisor", name)
                .description("Advisor execution time")
                .register(registry)
        );
        
        timer.record(Duration.ofMillis(durationMs));
    }
    
    /**
     * 创建带有性能监控的 Advisor 链
     */
    public List<Advisor> createProfiledAdvisorChain(
            List<Advisor> originalAdvisors,
            MeterRegistry registry) {
        
        List<Advisor> profiled = new ArrayList<>();
        
        for (int i = 0; i < originalAdvisors.size(); i++) {
            Advisor original = originalAdvisors.get(i);
            String advisorName = getAdvisorName(original);
            int order = i;
            
            profiled.add(new ProfiledAdvisor(original, advisorName, order, this));
        }
        
        return profiled;
    }
    
    private String getAdvisorName(Advisor advisor) {
        // 从 Advisor 类名提取
        String name = advisor.getClass().getSimpleName();
        if (name.contains("$")) {
            name = name.split("\\$")[0];
        }
        return name;
    }
    
    /**
     * 带性能监控的 Advisor 包装器
     */
    static class ProfiledAdvisor implements Advisor {
        
        private final Advisor delegate;
        private final String name;
        private final int order;
        private final AdvisorChainProfiler profiler;
        
        public ProfiledAdvisor(Advisor delegate, String name, 
                              int order, AdvisorChainProfiler profiler) {
            this.delegate = delegate;
            this.name = name;
            this.order = order;
            this.profiler = profiler;
        }
        
        @Override
        public ChatResponse advise(ChatContext context, ChatResponse response) {
            long startTime = System.nanoTime();
            
            try {
                return delegate.advise(context, response);
            } finally {
                long durationMs = (System.nanoTime() - startTime) / 1_000_000;
                profiler.recordAdvisorExecution(name, durationMs);
                
                // 慢 Advisor 告警
                if (durationMs > 100) {
                    // 记录慢 Advisor 日志
                    System.out.println("SLOW ADVISOR: " + name + 
                        " took " + durationMs + "ms");
                }
            }
        }
        
        @Override
        public Flux<ChatResponse> adviseStream(ChatContext context, 
                                                Flux<ChatResponse> responseStream) {
            
            long startTime = System.nanoTime();
            
            return responseStream
                .doFinally(signal -> {
                    long durationMs = (System.nanoTime() - startTime) / 1_000_000;
                    profiler.recordAdvisorExecution(name + "_stream", durationMs);
                });
        }
        
        @Override
        public int getOrder() {
            return delegate.getOrder();
        }
    }
}

6.3 瓶颈诊断与优化建议

java 复制代码
import java.util.*;
import java.util.stream.Collectors;

/**
 * Advisor 链瓶颈分析
 */
@Component
public class AdvisorBottleneckAnalyzer {
    
    /**
     * 分析 Advisor 链的性能瓶颈
     */
    public BottleneckReport analyzeChain(List<AdvisorMetrics> metrics) {
        
        // 计算总耗时
        long totalTime = metrics.stream()
            .mapToLong(AdvisorMetrics::getDurationMs)
            .sum();
        
        // 计算各 Advisor 占比
        List<AdvisorAnalysis> analysisList = metrics.stream()
            .map(m -> {
                double percentage = (m.getDurationMs() * 100.0) / totalTime;
                return new AdvisorAnalysis(
                    m.getName(),
                    m.getDurationMs(),
                    percentage,
                    getSuggestion(m.getName(), percentage)
                );
            })
            .sorted(Comparator.comparingDouble(AdvisorAnalysis::getPercentage).reversed())
            .collect(Collectors.toList());
        
        // 识别瓶颈
        List<String> bottlenecks = analysisList.stream()
            .filter(a -> a.getPercentage() > 30)
            .map(AdvisorAnalysis::getName)
            .collect(Collectors.toList());
        
        return new BottleneckReport(analysisList, totalTime, bottlenecks);
    }
    
    private String getSuggestion(String advisorName, double percentage) {
        return switch (advisorName.toLowerCase()) {
            case "questionansweradvisor" -> 
                "建议优化向量检索:检查索引构建、使用缓存、调整 topK";
            case "toolcalladvisor" -> 
                "建议:减少工具定义数量、优化工具执行逻辑";
            case "chatmemoryadvisor" -> 
                "建议:使用 Redis 分布式缓存、减少历史窗口";
            case "chatmodel" -> 
                "建议:考虑使用更快的模型、启用流式响应";
            default -> "建议持续监控,关注性能变化趋势";
        };
    }
    
    static class AdvisorMetrics {
        private String name;
        private long durationMs;
        
        public String getName() { return name; }
        public long getDurationMs() { return durationMs; }
    }
    
    static class AdvisorAnalysis {
        private String name;
        private long durationMs;
        private double percentage;
        private String suggestion;
        
        public AdvisorAnalysis(String name, long durationMs, 
                              double percentage, String suggestion) {
            this.name = name;
            this.durationMs = durationMs;
            this.percentage = percentage;
            this.suggestion = suggestion;
        }
        
        public String getName() { return name; }
        public long getDurationMs() { return durationMs; }
        public double getPercentage() { return percentage; }
    }
    
    static class BottleneckReport {
        private List<AdvisorAnalysis> analysis;
        private long totalTimeMs;
        private List<String> bottlenecks;
        
        public BottleneckReport(List<AdvisorAnalysis> analysis, 
                               long totalTimeMs, List<String> bottlenecks) {
            this.analysis = analysis;
            this.totalTimeMs = totalTimeMs;
            this.bottlenecks = bottlenecks;
        }
        
        // getters...
    }
}

七、EVAL 框架:基于 LLM 的响应质量评估

7.1 EVAL 框架概述

Spring AI 2.0 提供了 EVAL 框架,用于自动评估 AI 响应的质量:

复制代码
+------------------------------------------------------------------+
|                      EVAL 评估框架                                 |
+------------------------------------------------------------------+
|                                                                   |
|  +------------+      +-------------+      +---------------+     |
|  | 用户原始   |      |   LLM       |      |    评估       |     |
|  | Prompt     |----->|   响应      |----->|    结果       |     |
|  +------------+      +-------------+      +---------------+     |
|        |                  |                      |              |
|        v                  v                      v              |
|  +------------+      +-------------+      +---------------+     |
|  | 参考答案   |      |  评分标准   |      |  评估报告     |     |
|  | (可选)     |      |  (Criteria) |      |  (分数+理由)  |     |
|  +------------+      +-------------+      +---------------+     |
|                                                                   |
+------------------------------------------------------------------+

7.2 响应质量评估实现

java 复制代码
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.chat.prompt.UserPromptTemplate;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;

import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;

/**
 * AI 响应质量评估器
 */
@Component
public class ResponseQualityEvaluator {
    
    private final ChatModel chatModel;
    
    public ResponseQualityEvaluator(ChatModel chatModel) {
        this.chatModel = chatModel;
    }
    
    /**
     * 综合质量评估
     */
    public EvaluationResult evaluate(String question, String response, 
                                     String referenceAnswer) {
        
        // 1. 准确性评估
        double accuracyScore = evaluateAccuracy(question, response, referenceAnswer);
        
        // 2. 完整性评估
        double completenessScore = evaluateCompleteness(question, response);
        
        // 3. 幻觉检测
        HallucinationResult hallucination = detectHallucination(question, response, referenceAnswer);
        
        // 4. 毒性检测
        double toxicityScore = evaluateToxicity(response);
        
        // 综合评分
        double overallScore = (accuracyScore * 0.4 + 
                               completenessScore * 0.3 + 
                               (hallucination.isPassed() ? 1.0 : 0.0) * 0.2 +
                               toxicityScore * 0.1);
        
        return new EvaluationResult(
            accuracyScore,
            completenessScore,
            hallucination,
            toxicityScore,
            overallScore,
            overallScore >= 0.7 ? "PASS" : "FAIL"
        );
    }
    
    /**
     * 准确性评估
     */
    private double evaluateAccuracy(String question, String response, 
                                    String referenceAnswer) {
        
        String evaluationPrompt = String.format("""
            你是一个专业的 AI 评估专家。请评估以下 AI 回答的准确性。
            
            问题:%s
            
            AI 回答:%s
            
            参考答案:%s
            
            请从以下维度进行评分:
            1. 答案是否正确回答了问题?
            2. 答案与参考答案的一致性如何?
            3. 是否有事实性错误?
            
            请返回 JSON 格式的评分结果:
            {
                "score": 0.0-1.0,
                "reason": "评分理由",
                "errors": ["错误1", "错误2"]
            }
            """, question, response, referenceAnswer);
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(evaluationPrompt))
            );
            return parseScore(chatResponse.getResult().getOutput().getText());
        } catch (Exception e) {
            return 0.5; // 默认中等分数
        }
    }
    
    /**
     * 完整性评估
     */
    private double evaluateCompleteness(String question, String response) {
        
        String evaluationPrompt = String.format("""
            请评估以下回答的完整性。
            
            问题:%s
            
            回答:%s
            
            评估维度:
            1. 是否覆盖了问题的所有方面?
            2. 解释是否足够详细?
            3. 是否有遗漏的重要信息?
            
            返回 JSON:
            {
                "score": 0.0-1.0,
                "missing_aspects": ["缺失方面1", "缺失方面2"],
                "reason": "评分理由"
            }
            """, question, response);
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(evaluationPrompt))
            );
            return parseScore(chatResponse.getResult().getOutput().getText());
        } catch (Exception e) {
            return 0.5;
        }
    }
    
    /**
     * 幻觉检测
     */
    private HallucinationResult detectHallucination(String question, String response,
                                                    String referenceAnswer) {
        
        String prompt = String.format("""
            你是一个事实核查专家。请检测以下回答中是否存在幻觉(虚构的、不可验证的信息)。
            
            问题:%s
            
            回答:%s
            
            参考信息:%s
            
            幻觉类型:
            1. 事实性错误 - 与已知事实不符
            2. 不可验证 - 无法确认真假
            3. 过度推断 - 从前提推导出不必然的结论
            4. 虚构引用 - 不存在的引用或来源
            
            返回 JSON:
            {
                "is_hallucination": true/false,
                "hallucination_type": "factual/unverifiable/over_claimed/fabricated/null",
                "confidence": 0.0-1.0,
                "problematic_sentences": ["问题句子1", "问题句子2"],
                "explanation": "解释说明"
            }
            """, question, response, referenceAnswer != null ? referenceAnswer : "无参考");
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(prompt))
            );
            return parseHallucinationResult(chatResponse.getResult().getOutput().getText());
        } catch (Exception e) {
            return new HallucinationResult(false, "error", 0.0, 
                List.of(), "评估失败,使用默认结果");
        }
    }
    
    /**
     * 毒性检测
     */
    private double evaluateToxicity(String response) {
        
        // 简单的关键词检测 + LLM 评估
        // 实际生产应使用专门的毒性检测模型
        
        List<String> toxicPatterns = List.of(
            "暴力", "仇恨", "歧视", "违法", "犯罪",
            "赌博", "毒品", "色情", "自杀", "自残"
        );
        
        boolean containsToxic = toxicPatterns.stream()
            .anyMatch(p -> response.contains(p));
        
        if (containsToxic) {
            return 0.0;
        }
        
        // 调用 LLM 进行细粒度检测
        String toxicityPrompt = String.format("""
            请评估以下内容是否包含毒性信息。
            
            内容:%s
            
            返回 JSON:
            {
                "is_toxic": true/false,
                "toxicity_level": 0.0-1.0,
                "toxic_categories": ["类别1"],
                "reason": "理由"
            }
            """, response);
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(toxicityPrompt))
            );
            return 1.0 - parseToxicity(chatResponse.getResult().getOutput().getText());
        } catch (Exception e) {
            return 1.0;
        }
    }
    
    private double parseScore(String json) {
        // 简化解析,实际应使用 Jackson
        try {
            var pattern = Pattern.compile("\"score\"\\s*:\\s*([0-9.]+)");
            var matcher = pattern.matcher(json);
            if (matcher.find()) {
                return Double.parseDouble(matcher.group(1));
            }
        } catch (Exception e) {
            // ignore
        }
        return 0.5;
    }
    
    private HallucinationResult parseHallucinationResult(String json) {
        try {
            boolean isHallucination = json.contains("\"is_hallucination\" : true") || 
                                     json.contains("\"is_hallucination\": true");
            return new HallucinationResult(
                isHallucination,
                "factual",
                0.8,
                List.of(),
                "parsed from response"
            );
        } catch (Exception e) {
            return new HallucinationResult(false, "unknown", 0.0, List.of(), "parse error");
        }
    }
    
    private double parseToxicity(String json) {
        try {
            var pattern = Pattern.compile("\"toxicity_level\"\\s*:\\s*([0-9.]+)");
            var matcher = pattern.matcher(json);
            if (matcher.find()) {
                return Double.parseDouble(matcher.group(1));
            }
        } catch (Exception e) {
            // ignore
        }
        return 0.0;
    }
    
    // 内部类定义
    public static class EvaluationResult {
        private final double accuracyScore;
        private final double completenessScore;
        private final HallucinationResult hallucination;
        private final double toxicityScore;
        private final double overallScore;
        private final String passStatus;
        
        public EvaluationResult(double accuracyScore, double completenessScore,
                               HallucinationResult hallucination, double toxicityScore,
                               double overallScore, String passStatus) {
            this.accuracyScore = accuracyScore;
            this.completenessScore = completenessScore;
            this.hallucination = hallucination;
            this.toxicityScore = toxicityScore;
            this.overallScore = overallScore;
            this.passStatus = passStatus;
        }
        
        // getters
        public double getAccuracyScore() { return accuracyScore; }
        public double getCompletenessScore() { return completenessScore; }
        public HallucinationResult getHallucination() { return hallucination; }
        public double getToxicityScore() { return toxicityScore; }
        public double getOverallScore() { return overallScore; }
        public String getPassStatus() { return passStatus; }
    }
    
    public static class HallucinationResult {
        private final boolean passed;
        private final String type;
        private final double confidence;
        private final List<String> problematicSentences;
        private final String explanation;
        
        public HallucinationResult(boolean passed, String type, double confidence,
                                   List<String> problematicSentences, String explanation) {
            this.passed = passed;
            this.type = type;
            this.confidence = confidence;
            this.problematicSentences = problematicSentences;
            this.explanation = explanation;
        }
        
        public boolean isPassed() { return passed; }
        public String getType() { return type; }
        public double getConfidence() { return confidence; }
        public List<String> getProblematicSentences() { return problematicSentences; }
        public String getExplanation() { return explanation; }
    }
}

7.3 实战:构建企业级幻觉检测系统

java 复制代码
import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.messages.AssistantMessage;

import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.regex.Pattern;

/**
 * 企业级幻觉检测系统
 */
@Component
public class EnterpriseHallucinationDetector {
    
    private final ChatModel chatModel;
    private final VectorStore referenceStore;
    private final Map<String, ClaimCache> claimCache = new ConcurrentHashMap<>();
    
    // 事实库:用于快速验证常见事实
    private final Map<String, Boolean> factualDatabase = new ConcurrentHashMap<>();
    
    public EnterpriseHallucinationDetector(ChatModel chatModel, 
                                          VectorStore referenceStore) {
        this.chatModel = chatModel;
        this.referenceStore = referenceStore;
        initializeFactualDatabase();
    }
    
    /**
     * 多层幻觉检测
     */
    public HallucinationReport detect(String question, String response) {
        
        List<HallucinationIssue> issues = new ArrayList<>();
        
        // 第一层:规则检测(快速)
        List<HallucinationIssue> ruleIssues = ruleBasedDetection(response);
        issues.addAll(ruleIssues);
        
        // 第二层:事实库验证
        List<HallucinationIssue> factIssues = factualVerification(response);
        issues.addAll(factIssues);
        
        // 第三层:语义检索验证
        if (referenceStore != null) {
            List<HallucinationIssue> semanticIssues = semanticVerification(question, response);
            issues.addAll(semanticIssues);
        }
        
        // 第四层:LLM 深度分析
        List<HallucinationIssue> llmIssues = llmDeepAnalysis(question, response);
        issues.addAll(llmIssues);
        
        // 计算综合置信度
        double confidence = calculateConfidence(issues);
        
        return new HallucinationReport(
            issues.isEmpty(),
            confidence,
            issues,
            response.length()
        );
    }
    
    /**
     * 第一层:规则检测
     */
    private List<HallucinationIssue> ruleBasedDetection(String response) {
        List<HallucinationIssue> issues = new ArrayList<>();
        
        // 检测绝对性表述
        Pattern absolutePattern = Pattern.compile(
            "(绝对|肯定|必然|100%|所有|永远|从不|不可否认|毫无疑问)"
        );
        
        if (absolutePattern.matcher(response).find()) {
            issues.add(new HallucinationIssue(
                "absolute_statement",
                "检测到绝对性表述,可能存在过度推断",
                0.3,
                "warning"
            ));
        }
        
        // 检测虚构引用
        Pattern citationPattern = Pattern.compile(
            "(《[^》]+》|出自|根据|研究表明|数据显示|据[^\\s]+报道)"
        );
        
        if (citationPattern.matcher(response).find()) {
            // 需要进一步验证引用真实性
            issues.add(new HallucinationIssue(
                "citation_found",
                "检测到引用表述,需验证真实性",
                0.2,
                "info"
            ));
        }
        
        // 检测模糊表述
        Pattern vaguePattern = Pattern.compile(
            "(可能|也许|大概|似乎|据说|据说|有人|某些人)"
        );
        
        if (vaguePattern.matcher(response).find()) {
            issues.add(new HallucinationIssue(
                "vague_statement",
                "检测到模糊表述,信息来源不明确",
                0.2,
                "info"
            ));
        }
        
        return issues;
    }
    
    /**
     * 第二层:事实库验证
     */
    private List<HallucinationIssue> factualVerification(String response) {
        List<HallucinationIssue> issues = new ArrayList<>();
        
        // 提取关键claims并验证
        List<String> claims = extractClaims(response);
        
        for (String claim : claims) {
            if (factualDatabase.containsKey(claim)) {
                boolean isFact = factualDatabase.get(claim);
                if (!isFact) {
                    issues.add(new HallucinationIssue(
                        "fact_check_failed",
                        "声称 '" + claim + "' 与已知事实不符",
                        0.9,
                        "error"
                    ));
                }
            }
        }
        
        return issues;
    }
    
    /**
     * 第三层:语义检索验证
     */
    private List<HallucinationIssue> semanticVerification(String question, String response) {
        List<HallucinationIssue> issues = new ArrayList<>();
        
        try {
            // 检索相关文档
            List<Document> relevantDocs = referenceStore.similaritySearch(
                SearchRequest.query(response).withTopK(3)
            );
            
            if (relevantDocs.isEmpty()) {
                issues.add(new HallucinationIssue(
                    "no_reference_found",
                    "未找到相关参考文档支持该回答",
                    0.5,
                    "warning"
                ));
            } else {
                // 计算支持度
                double support = calculateSupport(response, relevantDocs);
                if (support < 0.3) {
                    issues.add(new HallucinationIssue(
                        "low_support",
                        String.format("参考文档支持度仅为 %.0f%%", support * 100),
                        0.7,
                        "warning"
                    ));
                }
            }
        } catch (Exception e) {
            // 检索失败,记录日志
        }
        
        return issues;
    }
    
    /**
     * 第四层:LLM 深度分析
     */
    private List<HallucinationIssue> llmDeepAnalysis(String question, String response) {
        List<HallucinationIssue> issues = new ArrayList<>();
        
        String analysisPrompt = String.format("""
            你是一个专业的幻觉检测专家。请分析以下回答是否存在幻觉。
            
            原始问题:%s
            
            回答内容:%s
            
            请识别:
            1. 事实性错误(与客观事实不符)
            2. 不可验证的声称(无法确认真假)
            3. 过度推断(从给定前提得出不必然的结论)
            4. 内部矛盾(回答内部逻辑不一致)
            
            返回严格的 JSON 格式:
            {
                "has_hallucination": true/false,
                "issues": [
                    {
                        "type": "factual_error/unverifiable/over_claim/contradiction",
                        "description": "问题描述",
                        "evidence": "证据或原文引用",
                        "severity": "high/medium/low"
                    }
                ],
                "overall_confidence": 0.0-1.0,
                "summary": "一句话总结"
            }
            """, question, response);
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(analysisPrompt))
            );
            
            // 解析LLM返回的JSON并转换为issues
            // 简化处理
            String result = chatResponse.getResult().getOutput().getText();
            
            if (result.contains("\"has_hallucination\" : true") || 
                result.contains("\"has_hallucination\": true")) {
                issues.add(new HallucinationIssue(
                    "llm_detected",
                    "LLM 检测到潜在幻觉",
                    0.6,
                    "warning"
                ));
            }
            
        } catch (Exception e) {
            // LLM 调用失败
        }
        
        return issues;
    }
    
    private List<String> extractClaims(String text) {
        // 简化:按句子提取
        List<String> claims = new ArrayList<>();
        String[] sentences = text.split("[。!?]");
        
        for (String sentence : sentences) {
            if (sentence.length() > 10 && sentence.length() < 100) {
                claims.add(sentence.trim());
            }
        }
        
        return claims;
    }
    
    private double calculateSupport(String response, List<Document> documents) {
        // 简化:基于关键词重叠度
        Set<String> responseWords = new HashSet<>(Arrays.asList(response.split("\\s+")));
        double totalOverlap = 0;
        
        for (Document doc : documents) {
            Set<String> docWords = new HashSet<>(Arrays.asList(doc.getText().split("\\s+")));
            Set<String> overlap = new HashSet<>(responseWords);
            overlap.retainAll(docWords);
            totalOverlap += (double) overlap.size() / responseWords.size();
        }
        
        return totalOverlap / documents.size();
    }
    
    private double calculateConfidence(List<HallucinationIssue> issues) {
        if (issues.isEmpty()) {
            return 0.0;
        }
        
        double total = issues.stream()
            .mapToDouble(i -> i.severity.equals("error") ? 1.0 : 
                        i.severity.equals("warning") ? 0.6 : 0.3)
            .sum();
        
        return Math.min(total, 1.0);
    }
    
    private void initializeFactualDatabase() {
        // 初始化常见事实(实际生产应从数据库加载)
        factualDatabase.put("太阳从东方升起", true);
        factualDatabase.put("水的化学式是H2O", true);
        factualDatabase.put("地球围绕太阳转", true);
    }
    
    // 内部类
    public static class HallucinationReport {
        private final boolean passed;
        private final double confidence;
        private final List<HallucinationIssue> issues;
        private final int responseLength;
        
        public HallucinationReport(boolean passed, double confidence, 
                                   List<HallucinationIssue> issues, int responseLength) {
            this.passed = passed;
            this.confidence = confidence;
            this.issues = issues;
            this.responseLength = responseLength;
        }
        
        public boolean isPassed() { return passed; }
        public double getConfidence() { return confidence; }
        public List<HallucinationIssue> getIssues() { return issues; }
    }
    
    public static class HallucinationIssue {
        private final String type;
        private final String description;
        private final double severity;
        private final String severityLevel;
        
        public HallucinationIssue(String type, String description, 
                                  double severity, String severityLevel) {
            this.type = type;
            this.description = description;
            this.severity = severity;
            this.severityLevel = severityLevel;
        }
        
        public String getType() { return type; }
        public String getDescription() { return description; }
        public double getSeverity() { return severity; }
    }
}

7.4 EVAL 集成到 Advisor 链

java 复制代码
import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.ai.chat.model.ChatContext;
import org.springframework.ai.chat.model.ChatResponse;

/**
 * EVAL 评估 Advisor
 */
@Component
public class EvaluationAdvisor implements Advisor {
    
    private final ResponseQualityEvaluator evaluator;
    private final EnterpriseHallucinationDetector hallucinationDetector;
    private final double qualityThreshold;
    
    public EvaluationAdvisor(ResponseQualityEvaluator evaluator,
                             EnterpriseHallucinationDetector hallucinationDetector,
                             double qualityThreshold) {
        this.evaluator = evaluator;
        this.hallucinationDetector = hallucinationDetector;
        this.qualityThreshold = qualityThreshold;
    }
    
    @Override
    public ChatResponse advise(ChatContext context, ChatResponse response) {
        
        // 获取用户问题
        String question = extractQuestion(context);
        String answer = extractAnswer(response);
        
        // 1. 幻觉检测
        EnterpriseHallucinationDetector.HallucinationReport hallucinationReport = 
            hallucinationDetector.detect(question, answer);
        
        if (!hallucinationReport.isPassed()) {
            // 记录警告
            System.out.println("HALLUCINATION DETECTED: " + 
                hallucinationReport.getConfidence());
            
            // 决策:拒绝/警告/重新生成
            if (hallucinationReport.getConfidence() > 0.8) {
                // 高置信度幻觉,标记响应
                response.getResult().getOutput().getText();
                // 可以添加元数据标记
            }
        }
        
        // 2. 质量评估(可选,异步执行)
        // 实际生产应异步处理,避免影响响应延迟
        
        return response;
    }
    
    private String extractQuestion(ChatContext context) {
        if (context != null && !context.getMessages().isEmpty()) {
            return context.getMessages().get(0).getText();
        }
        return "";
    }
    
    private String extractAnswer(ChatResponse response) {
        return response.getResult().getOutput().getText();
    }
    
    @Override
    public int getOrder() {
        return Integer.MAX_VALUE; // 最后执行
    }
}

八、完整集成示例

8.1 一站式可观测性配置

java 复制代码
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * Spring AI 可观测性自动配置
 */
@Configuration
public class AIObservabilityConfiguration {
    
    @Bean
    @ConfigurationProperties(prefix = "ai.observability")
    public AIObservabilityProperties observabilityProperties() {
        return new AIObservabilityProperties();
    }
    
    @Bean
    public AIMetricsCollector aiMetricsCollector(
            MeterRegistry registry,
            AIObservabilityProperties properties) {
        return new AIMetricsCollector(registry, properties);
    }
    
    @Bean
    public AdvisorChainProfiler advisorChainProfiler(MeterRegistry registry) {
        return new AdvisorChainProfiler(registry);
    }
    
    @Bean
    public ResponseQualityEvaluator responseQualityEvaluator(ChatModel chatModel) {
        return new ResponseQualityEvaluator(chatModel);
    }
    
    @Bean
    public EnterpriseHallucinationDetector enterpriseHallucinationDetector(
            ChatModel chatModel,
            VectorStore vectorStore) {
        return new EnterpriseHallucinationDetector(chatModel, vectorStore);
    }
    
    @Bean
    public EvaluationAdvisor evaluationAdvisor(
            ResponseQualityEvaluator evaluator,
            EnterpriseHallucinationDetector detector,
            AIObservabilityProperties properties) {
        return new EvaluationAdvisor(evaluator, detector, 
            properties.getQualityThreshold());
    }
    
    // 配置属性类
    static class AIObservabilityProperties {
        private boolean enabled = true;
        private double qualityThreshold = 0.7;
        private boolean enableEvaluation = false;
        private boolean enableHallucinationDetection = true;
        
        // getters and setters
        public boolean isEnabled() { return enabled; }
        public void setEnabled(boolean enabled) { this.enabled = enabled; }
        public double getQualityThreshold() { return qualityThreshold; }
        public void setQualityThreshold(double qualityThreshold) { 
            this.qualityThreshold = qualityThreshold; 
        }
        public boolean isEnableEvaluation() { return enableEvaluation; }
        public void setEnableEvaluation(boolean enableEvaluation) { 
            this.enableEvaluation = enableEvaluation; 
        }
        public boolean isEnableHallucinationDetection() { 
            return enableHallucinationDetection; 
        }
        public void setEnableHallucinationDetection(boolean enableHallucinationDetection) {
            this.enableHallucinationDetection = enableHallucinationDetection;
        }
    }
}

8.2 application.yml 完整配置

yaml 复制代码
spring:
  application:
    name: ai-chat-service
  
  ai:
    observability:
      enabled: true
      quality-threshold: 0.7
      enable-evaluation: true
      enable-hallucination-detection: true
    
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
    
    vectorstore:
      redis:
        url: redis://localhost:6379
        index-name: ai-docs

# 监控配置
management:
  endpoints:
    web:
      exposure:
        include: prometheus,metrics,health,info
  metrics:
    tags:
      application: ${spring.application.name}
    export:
      prometheus:
        enabled: true
  observations:
    enabled: true
  tracing:
    sampling:
      probability: 0.3

# Zipkin 配置
spring.zipkin:
  base-url: http://localhost:9411
  sender:
    type: web

# 自定义指标配置
ai:
  observability:
    metrics:
      latency-percentiles: 0.5,0.95,0.99
      token-price:
        gpt-4o-input: 5.0
        gpt-4o-output: 15.0
        gpt-4o-mini-input: 0.15
        gpt-4o-mini-output: 0.6

九、总结与最佳实践

9.1 可观测性实施路线图

复制代码
阶段一:基础监控(1-2周)
├── 集成 Micrometer + Prometheus
├── 采集基础延迟、错误指标
└── 配置 Grafana 看板

阶段二:成本控制(2-3周)
├── 接入 Token 计数
├── 实现成本计算
└── 配置预算告警

阶段三:链路追踪(2-3周)
├── 集成 Zipkin/Jaeger
├── 自定义业务标签
└── Advisor 链追踪

阶段四:质量保障(3-4周)
├── 实现 EVAL 评估框架
├── 部署幻觉检测系统
└── 建立质量门禁

9.2 关键指标告警规则

指标 告警阈值 说明
ai.call.latency.p99 > 5000ms LLM 调用超时
ai.call.errors.rate > 5% 调用失败率过高
ai.cost.hourly > $100 小时成本超预算
advisor.execution.time > 1000ms 单个 Advisor 慢
hallucination.confidence > 0.8 高置信度幻觉

9.3 架构师视角的思考

企业级 AI 应用的可观测性建设,本质上是在解决三个核心问题:

  1. 可观测性 -> 可控性:通过监控延迟、错误率、成本,确保 AI 服务在可控范围内运行
  2. 可观测性 -> 质量保障:通过 EVAL 框架和幻觉检测,把控 AI 输出的质量底线
  3. 可观测性 -> 持续优化:通过 Advisor 链剖析,识别性能瓶颈,指导架构优化

Spring AI 2.0 的可观测性体系与 Spring 生态深度集成,对于已有 Spring Boot 技术栈的团队来说,是构建企业级 AI 应用的最佳选择。

记住:AI 应用的可观测性不是锦上添花,而是生产落地的必备基础设施。在追求 AI 能力的同时,必须同步建设与之匹配的可观测性能力。


参考资源:

相关推荐
qq_233772711 小时前
元——人工智能
人工智能
GIS数据转换器1 小时前
小龙虾(OpenClaw) 在低空经济领域的应用
大数据·人工智能·无人机·智慧城市·制造
用户69371750013841 小时前
OS级AI Agent:手机操作系统的下一个战场
android·前端·人工智能
大胖某人1 小时前
Kali系统安装OpenClaw调用DeepSeek API部署方法详解
linux·人工智能
BJ_Bonree2 小时前
直播预告 | 三步构建可观测体系,守护制造业业务连续性
人工智能·可观测性
C羊驼2 小时前
C 语言:哥德巴赫猜想
c语言·开发语言·人工智能·经验分享·笔记·算法·课程设计
小荟荟2 小时前
全国数据资产新闻和报纸摘要联播 2026年3月11日 第15期
大数据·人工智能
蓝队云计算2 小时前
怎么用服务器养龙虾OpenClaw?云上OpenClaw快速部署指南(小白极速版)
运维·服务器·人工智能·云服务器·openclaw
芯跳加速2 小时前
AI 视频自动化学习日记 · 第二天
人工智能