SpringAI 2.0 可观测性体系：AI 操作追踪、指标监控与评估框架

摘要：在企业级 AI 应用落地过程中，可观测性是保障系统稳定性、控制成本、评估质量的核心基础设施。Spring AI 2.0 提供了完整的可观测性体系，涵盖分布式追踪、指标监控、Token 消耗分析、Advisor 链执行剖析，以及基于 LLM 的响应质量评估（EVAL）框架。本文将从资深架构师视角，深入剖析这些能力的实现原理与实战用法。

一、为什么 AI 应用需要可观测性？

传统 Java 应用的监控体系（APM）已经非常成熟，但 AI 应用带来了新的挑战：

维度	传统微服务	AI 应用
调用延迟	几十毫秒级别	数百毫秒到数秒
成本模型	相对固定	按 Token 计费，波动大
质量评估	确定性结果	需要语义评估，存在幻觉风险
执行链路	同步调用	Advisor 链式调用，复杂度高

一个生产级的 AI 应用，需要回答以下问题：

当前 AI 调用的延迟分布如何？P99 是多少？
每次对话消耗了多少 Token？成本是多少？
AI 返回的答案是否可靠？是否存在幻觉？
Advisor 链中哪个环节是性能瓶颈？

Spring AI 2.0 的可观测性体系正是为解决这些问题而设计。

二、Spring AI 可观测性整体架构

复制代码

+------------------------------------------------------------------+
|                        Spring AI 2.0                              |
+------------------------------------------------------------------+
|  +----------------+  +----------------+  +--------------------+  |
|  |   Metrics      |  |   Tracing      |  |    EVAL            |  |
|  |  (Micrometer)  |  | (Zipkin/Jaeger)|  | (Quality Assess)   |  |
|  +--------+-------+  +--------+-------+  +---------+---------+  |
|           |                    |                    |           |
|           v                    v                    v           |
|  +--------------------------------------------------+           |
|  |              ObservationRegistry                |           |
|  +--------------------------------------------------+           |
+------------------------------------------------------------------+
           |                    |                    |
           v                    v                    v
     [Prometheus]          [Zipkin]            [LLM Evaluator]
     [Grafana]             [Jaeger]

Spring AI 基于 Micrometer 的 ObservationRegistry 统一管理可观测性，所有监控数据通过标准的 Observation 接口输出，支持多种后端。

三、Token 消耗监控：企业成本控制的基础

3.1 JTokkit Token 计数

Spring AI 2.0 内置了 JTokkit Token 计数估算器，这是目前最准确的 Token 估算库之一。

java 复制代码

import org.springframework.ai.tokenizer.JTokkitTokenCountEstimator;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.UserMessage;

// 初始化 Token 计数估算器
JTokkitTokenCountEstimator estimator = new JTokkitTokenCountEstimator();

// 计算单条消息的 Token
UserMessage userMessage = new UserMessage("请详细解释Spring Boot的自动配置原理");
int userTokens = estimator.estimate(userMessage);
System.out.println("用户消息 Token 数: " + userTokens);

// 计算 Assistant 消息的 Token
String assistantResponse = "Spring Boot 自动配置的核心原理是...";
int assistantTokens = estimator.estimate(assistantResponse);
System.out.println("Assistant 响应 Token 数: " + assistantTokens);

// 计算完整对话的 Token（包含消息历史）
List<org.springframework.ai.chat.messages.Message> messages = List.of(
    new UserMessage("什么是依赖注入？"),
    new AssistantMessage("依赖注入是一种设计模式..."),
    new UserMessage("请详细说明"),
    new AssistantMessage("详细说明如下...")
);
int totalTokens = estimator.estimate(messages);
System.out.println("对话总 Token 数: " + totalTokens);

3.2 Content 接口的统一 Token 计算

Spring AI 的 Content 接口统一了消息和文档的 Token 计算方式：

java 复制代码

import org.springframework.ai.content.Content;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.chat.prompt.UserPromptTemplate;

// 自定义文档内容实现 Content 接口
public class MyDocument implements Content {
    private String text;
    private Map<String, Object> metadata;
    
    // getters and setters
    
    @Override
    public String getText() {
        return this.text;
    }
    
    @Override
    public Map<String, Object> getMetadata() {
        return this.metadata;
    }
}

// 在 RAG 场景中计算检索文档的总 Token
public class TokenBudgetManager {
    private final JTokkitTokenCountEstimator estimator;
    private final int maxTokens;
    
    public TokenBudgetManager(int maxTokens) {
        this.estimator = new JTokkitTokenCountEstimator();
        this.maxTokens = maxTokens;
    }
    
    public boolean canFitInContext(List<Content> documents) {
        int totalTokens = documents.stream()
            .mapToInt(doc -> estimator.estimate(doc.getText()))
            .sum();
        return totalTokens <= maxTokens;
    }
    
    public List<Content> fitDocuments(List<Content> documents) {
        List<Content> fitting = new ArrayList<>();
        int currentTokens = 0;
        
        for (Content doc : documents) {
            int docTokens = estimator.estimate(doc.getText());
            if (currentTokens + docTokens <= maxTokens) {
                fitting.add(doc);
                currentTokens += docTokens;
            } else {
                break; // 超出预算，不再添加
            }
        }
        return fitting;
    }
}

四、Micrometer 指标埋点：延迟、Token、成本全景监控

4.1 Spring AI 指标自动配置

Spring AI 2.0 依赖 spring-boot-starter-actuator 和 Micrometer，自动暴露 AI 相关的指标：

xml 复制代码

<!-- pom.xml 依赖配置 -->
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-spring-boot-starter</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
</dependencies>

yaml 复制代码

# application.yml 配置
management:
  endpoints:
    web:
      exposure:
        include: prometheus,metrics,health
  metrics:
    tags:
      application: ${spring.application.name}
    export:
      prometheus:
        enabled: true
  observations:
    enabled: true

4.2 自定义 AI 指标埋点

在企业级场景中，我们需要自定义更细粒度的指标：

java 复制代码

import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.Meter;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.MessageAggregator;
import org.springframework.stereotype.Component;
import reactor.core.publisher.Flux;

import java.time.Duration;
import java.util.concurrent.atomic.AtomicLong;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Component
public class AIMetricsCollector {
    
    private final MeterRegistry registry;
    private final Map<String, AtomicLong> tokenUsageCache = new ConcurrentHashMap<>();
    private final Map<String, Timer> callTimers = new ConcurrentHashMap<>();
    private final Map<String, Counter> errorCounters = new ConcurrentHashMap<>();
    
    // Token 价格配置 (单位: 每百万 Token)
    private static final Map<String, Double> MODEL_PRICING = Map.of(
        "gpt-4o", 5.0,    // 输入
        "gpt-4o", 15.0,   // 输出
        "gpt-4o-mini", 0.15,
        "gpt-4o-mini", 0.6,
        "claude-3-5-sonnet", 3.0,
        "claude-3-5-sonnet", 15.0
    );
    
    public AIMetricsCollector(MeterRegistry registry) {
        this.registry = registry;
        initializeMetrics();
    }
    
    private void initializeMetrics() {
        // 注册 Token 使用量 Gauge
        Gauge.builder("ai.token.usage.total", tokenUsageCache, 
            map -> map.values().stream()
                .mapToLong(AtomicLong::get)
                .sum())
            .tag("type", "all")
            .description("Total tokens used")
            .register(registry);
    }
    
    /**
     * 记录 AI 调用的完整指标
     */
    public void recordCall(String model, String operation, 
                          int promptTokens, int completionTokens,
                          long latencyMs, boolean success) {
        
        String modelKey = model.replace(":", "_");
        
        // 1. 延迟指标
        Timer timer = callTimers.computeIfAbsent(operation + "_" + modelKey, k -> 
            Timer.builder("ai.call.latency")
                .tag("model", model)
                .tag("operation", operation)
                .description("AI call latency")
                .register(registry));
        
        timer.record(Duration.ofMillis(latencyMs));
        
        // 2. Token 消耗指标
        if (promptTokens > 0) {
            Counter.builder("ai.token.prompt")
                .tag("model", model)
                .tag("operation", operation)
                .register(registry)
                .increment(promptTokens);
        }
        
        if (completionTokens > 0) {
            Counter.builder("ai.token.completion")
                .tag("model", model)
                .tag("operation", operation)
                .register(registry)
                .increment(completionTokens);
        }
        
        // 3. 成本指标 (USD)
        double cost = calculateCost(model, promptTokens, completionTokens);
        Counter.builder("ai.cost.usd")
            .tag("model", model)
            .tag("operation", operation)
            .register(registry)
            .increment(cost);
        
        // 4. 成功/失败计数
        if (success) {
            Counter.builder("ai.call.count")
                .tag("model", model)
                .tag("operation", operation)
                .tag("status", "success")
                .register(registry)
                .increment();
        } else {
            Counter errorCounter = errorCounters.computeIfAbsent(
                operation + "_" + modelKey, k -> 
                Counter.builder("ai.call.errors")
                    .tag("model", model)
                    .tag("operation", operation)
                    .description("AI call errors")
                    .register(registry));
            errorCounter.increment();
        }
    }
    
    private double calculateCost(String model, int promptTokens, int completionTokens) {
        // 简化计算：实际生产应查询 MODEL_PRICING
        double inputPrice = 5.0 / 1_000_000;  // $5 per 1M input
        double outputPrice = 15.0 / 1_000_000; // $15 per 1M output
        
        return (promptTokens * inputPrice) + (completionTokens * outputPrice);
    }
    
    /**
     * 包装 ChatModel 以自动收集指标
     */
    @Component
    public class InstrumentedChatModel implements ChatModel {
        
        private final ChatModel delegate;
        private final AIMetricsCollector metrics;
        
        public InstrumentedChatModel(ChatModel delegate, AIMetricsCollector metrics) {
            this.delegate = delegate;
            this.metrics = metrics;
        }
        
        @Override
        public ChatResponse call(Prompt prompt) {
            long startTime = System.currentTimeMillis();
            boolean success = false;
            int promptTokens = 0;
            int completionTokens = 0;
            
            try {
                ChatResponse response = delegate.call(prompt);
                success = true;
                
                // 尝试从响应中提取 Token 用量
                // 实际实现需要根据不同模型适配
                return response;
            } catch (Exception e) {
                throw e;
            } finally {
                long latency = System.currentTimeMillis() - startTime;
                metrics.recordCall(
                    getDefaultModel(),
                    "sync_call",
                    promptTokens,
                    completionTokens,
                    latency,
                    success
                );
            }
        }
        
        @Override
        public Flux<ChatResponse> stream(Prompt prompt) {
            long startTime = System.currentTimeMillis();
            
            return delegate.stream(prompt)
                .doFinally(signalType -> {
                    long latency = System.currentTimeMillis() - startTime;
                    metrics.recordCall(
                        getDefaultModel(),
                        "stream_call",
                        0, 0,  // 流式调用难以精确统计 Token
                        latency,
                        signalType != null
                    );
                });
        }
        
        private String getDefaultModel() {
            // 从配置中获取默认模型名称
            return "gpt-4o";
        }
    }
}

4.3 Grafana 看板配置

以下是生产级 Grafana 看板的 JSON 配置核心部分：

json 复制代码

{
  "panels": [
    {
      "title": "AI 调用延迟分布",
      "type": "timeseries",
      "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
      "targets": [
        {
          "expr": "histogram_quantile(0.50, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
          "legendFormat": "P50 - {{model}}"
        },
        {
          "expr": "histogram_quantile(0.95, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
          "legendFormat": "P95 - {{model}}"
        },
        {
          "expr": "histogram_quantile(0.99, sum(rate(ai_call_latency_seconds_bucket{service=\"chat-service\"}[5m])) by (le, model))",
          "legendFormat": "P99 - {{model}}"
        }
      ]
    },
    {
      "title": "Token 消耗趋势",
      "type": "timeseries",
      "gridPos": {"x": 12, "y": 0, "w": 12, "h": 8},
      "targets": [
        {
          "expr": "sum(increase(ai_token_prompt_total[1h])) by (model)",
          "legendFormat": "Input - {{model}}"
        },
        {
          "expr": "sum(increase(ai_token_completion_total[1h])) by (model)",
          "legendFormat": "Output - {{model}}"
        }
      ]
    },
    {
      "title": "AI 成本实时监控",
      "type": "stat",
      "gridPos": {"x": 0, "y": 8, "w": 6, "h": 4},
      "targets": [
        {
          "expr": "sum(ai_cost_usd_total{service=\"chat-service\"})",
          "legendFormat": "Total Cost"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "currencyUSD",
          "thresholds": {
            "steps": [
              {"value": 0, "color": "green"},
              {"value": 100, "color": "yellow"},
              {"value": 500, "color": "red"}
            ]
          }
        }
      }
    },
    {
      "title": "调用成功率",
      "type": "gauge",
      "gridPos": {"x": 6, "y": 8, "w": 6, "h": 4},
      "targets": [
        {
          "expr": "sum(rate(ai_call_count_total{service=\"chat-service\",status=\"success\"}[5m])) / sum(rate(ai_call_count_total{service=\"chat-service\"}[5m])) * 100"
        }
      ]
    },
    {
      "title": "模型调用占比",
      "type": "piechart",
      "gridPos": {"x": 12, "y": 8, "w": 6, "h": 4},
      "targets": [
        {
          "expr": "sum(increase(ai_call_count_total{service=\"chat-service\"}[1h])) by (model)"
        }
      ]
    }
  ]
}

五、分布式追踪：Zipkin/Jaeger 集成

5.1 Spring AI 追踪自动配置

Spring AI 2.0 与 Spring Cloud Sleuth/Zipkin 无缝集成：

xml 复制代码

<dependencies>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-sleuth-zipkin</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-spring-boot-starter</artifactId>
    </dependency>
</dependencies>

yaml 复制代码

spring:
  application:
    name: chat-service
  sleuth:
    sampler:
      probability: 1.0  # 生产环境建议 0.1-0.3
  zipkin:
    base-url: http://localhost:9411
    sender:
      type: web

5.2 自定义追踪上下文

在复杂的企业场景中，我们需要将业务上下文传递到 AI 追踪中：

java 复制代码

import org.springframework.cloud.sleuth.BaggageInScope;
import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.MessageAggregator;
import reactor.core.publisher.Flux;

import java.util.Map;
import java.util.HashMap;

@Component
public class AITracingService {
    
    private final Tracer tracer;
    private final ChatModel chatModel;
    
    public AITracingService(Tracer tracer, ChatModel chatModel) {
        this.tracer = tracer;
        this.chatModel = chatModel;
    }
    
    /**
     * 带追踪的 AI 调用
     */
    public ChatResponse callWithTrace(Prompt prompt, Map<String, String> businessContext) {
        
        Span span = tracer.startScopedSpan("ai.chat.call");
        
        try {
            // 添加业务标签
            span.tag("ai.model", "gpt-4o");
            span.tag("ai.operation", "chat");
            businessContext.forEach(span::tag);
            
            // 添加 baggage (跨服务传递)
            span.addEvent("start_chat_call");
            
            long startTime = System.currentTimeMillis();
            ChatResponse response = chatModel.call(prompt);
            long callDuration = System.currentTimeMillis() - startTime;
            
            // 记录耗时
            span.event("chat_completed");
            span.tag("ai.duration.ms", String.valueOf(callDuration));
            
            // 记录 Token 用量（如果有）
            // span.tag("ai.tokens.prompt", String.valueOf(promptTokens));
            // span.tag("ai.tokens.completion", String.valueOf(completionTokens));
            
            return response;
            
        } catch (Exception e) {
            span.error(e);
            span.tag("ai.error", e.getClass().getSimpleName());
            throw e;
        } finally {
            span.end();
        }
    }
    
    /**
     * 带追踪的流式调用
     */
    public Flux<ChatResponse> streamWithTrace(Prompt prompt) {
        
        Span span = tracer.startScopedSpan("ai.chat.stream");
        span.tag("ai.model", "gpt-4o");
        span.tag("ai.operation", "stream");
        
        return chatModel.stream(prompt)
            .doOnComplete(() -> span.event("stream_completed"))
            .doOnError(e -> span.error(e))
            .doFinally(signal -> span.end());
    }
    
    /**
     * 跨服务追踪：在调用外部 AI 服务时传递 traceId
     */
    public Map<String, String> getTracingHeaders() {
        Map<String, String> headers = new HashMap<>();
        
        // 获取当前 trace 和 span 信息
        Span currentSpan = tracer.currentSpan();
        if (currentSpan != null) {
            headers.put("X-B3-TraceId", currentSpan.context().traceId());
            headers.put("X-B3-SpanId", currentSpan.context().spanId());
            headers.put("X-B3-ParentSpanId", currentSpan.context().parentId());
        }
        
        // 获取 baggage
        try (BaggageInScope baggage = tracer.getBaggage("user-id")) {
            if (baggage != null) {
                headers.put("X-User-Id", baggage.getValue());
            }
        }
        
        return headers;
    }
}

5.3 Advisor 链追踪

Advisor 链是 Spring AI 2.0 的核心特性，每个 Advisor 的执行都可以被追踪：

java 复制代码

import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.ai.chat.client.advisor.RetrievalAugmentationAdvisor;
import org.springframework.ai.chat.model.MessageAggregator;
import org.springframework.ai.vectorstore.VectorStore;
import reactor.core.publisher.Flux;

/**
 * 带追踪的 Advisor 链配置
 */
@Configuration
public class TracedAdvisorConfiguration {
    
    @Bean
    public Advisor tracedChatMemoryAdvisor(ChatMemory chatMemory) {
        // 包装原始 Advisor 以添加追踪
        return new TracingAdvisor(
            MessageChatMemoryAdvisor.builder(chatMemory)
                .build(),
            "chat_memory"
        );
    }
    
    @Bean
    public Advisor tracedQuestionAnswerAdvisor(VectorStore vectorStore) {
        return new TracingAdvisor(
            QuestionAnswerAdvisor.builder(vectorStore)
                .build(),
            "question_answer"
        );
    }
    
    /**
     * 追踪装饰器
     */
    static class TracingAdvisor implements Advisor {
        
        private final Advisor delegate;
        private final String name;
        
        public TracingAdvisor(Advisor delegate, String name) {
            this.delegate = delegate;
            this.name = name;
        }
        
        @Override
        public ChatResponse advise(ChatContext context, ChatResponse response) {
            long startTime = System.currentTimeMillis();
            Span span = tracer.startScopedSpan("advisor." + name);
            
            try {
                span.tag("advisor.type", name);
                ChatResponse result = delegate.advise(context, response);
                span.tag("advisor.execution.success", "true");
                return result;
            } catch (Exception e) {
                span.error(e);
                throw e;
            } finally {
                span.tag("advisor.duration.ms", 
                    String.valueOf(System.currentTimeMillis() - startTime));
                span.end();
            }
        }
        
        // delegate other methods to delegate...
    }
}

六、Advisor 链执行耗时剖析与瓶颈诊断

6.1 Advisor 链执行原理

复制代码

+------------------------------------------------------------------+
|                     User Request                                   |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [1] SafeGuardAdvisor (内容安全检查)           ~5ms               |
|      - 敏感词检测                                                      |
|      - 安全策略拦截                                                   |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [2] ReReading Advisor (Re2)                      ~10ms          |
|      - 提示词增强                                                     |
|      - 指令强化                                                       |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [3] ChatMemoryAdvisor (对话历史管理)           ~2ms             |
|      - 历史消息加载                                                   |
|      - Token 预算控制                                                 |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [4] QuestionAnswerAdvisor (RAG 检索增强)        ~100-500ms      |
|      - 向量检索                                                       |
|      - 上下文组装                                                     |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [5] ToolCallAdvisor (工具调用管理)              ~50-2000ms       |
|      - 函数选择                                                       |
|      - 执行调用                                                       |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [6] LastMaxTokenSizeContentPurger              ~1ms              |
|      - Token 上限管理                                                |
|      - 上下文裁剪                                                     |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|  [7] ChatModel (LLM 调用)                     ~500-3000ms        |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|                     Response                                       |
+------------------------------------------------------------------+

6.2 Advisor 链性能监控

java 复制代码

import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.*;

import java.util.*;
import java.util.concurrent.ConcurrentHashMap;

/**
 * Advisor 链执行耗时监控
 */
@Component
public class AdvisorChainProfiler {
    
    private final MeterRegistry registry;
    private final Map<String, Timer> advisorTimers = new ConcurrentHashMap<>();
    
    public AdvisorChainProfiler(MeterRegistry registry) {
        this.registry = registry;
    }
    
    /**
     * 监控单个 Advisor 的执行时间
     */
    public void recordAdvisorExecution(String advisorName, long durationMs) {
        Timer timer = advisorTimers.computeIfAbsent(advisorName, name -> 
            Timer.builder("advisor.execution.time")
                .tag("advisor", name)
                .description("Advisor execution time")
                .register(registry)
        );
        
        timer.record(Duration.ofMillis(durationMs));
    }
    
    /**
     * 创建带有性能监控的 Advisor 链
     */
    public List<Advisor> createProfiledAdvisorChain(
            List<Advisor> originalAdvisors,
            MeterRegistry registry) {
        
        List<Advisor> profiled = new ArrayList<>();
        
        for (int i = 0; i < originalAdvisors.size(); i++) {
            Advisor original = originalAdvisors.get(i);
            String advisorName = getAdvisorName(original);
            int order = i;
            
            profiled.add(new ProfiledAdvisor(original, advisorName, order, this));
        }
        
        return profiled;
    }
    
    private String getAdvisorName(Advisor advisor) {
        // 从 Advisor 类名提取
        String name = advisor.getClass().getSimpleName();
        if (name.contains("$")) {
            name = name.split("\\$")[0];
        }
        return name;
    }
    
    /**
     * 带性能监控的 Advisor 包装器
     */
    static class ProfiledAdvisor implements Advisor {
        
        private final Advisor delegate;
        private final String name;
        private final int order;
        private final AdvisorChainProfiler profiler;
        
        public ProfiledAdvisor(Advisor delegate, String name, 
                              int order, AdvisorChainProfiler profiler) {
            this.delegate = delegate;
            this.name = name;
            this.order = order;
            this.profiler = profiler;
        }
        
        @Override
        public ChatResponse advise(ChatContext context, ChatResponse response) {
            long startTime = System.nanoTime();
            
            try {
                return delegate.advise(context, response);
            } finally {
                long durationMs = (System.nanoTime() - startTime) / 1_000_000;
                profiler.recordAdvisorExecution(name, durationMs);
                
                // 慢 Advisor 告警
                if (durationMs > 100) {
                    // 记录慢 Advisor 日志
                    System.out.println("SLOW ADVISOR: " + name + 
                        " took " + durationMs + "ms");
                }
            }
        }
        
        @Override
        public Flux<ChatResponse> adviseStream(ChatContext context, 
                                                Flux<ChatResponse> responseStream) {
            
            long startTime = System.nanoTime();
            
            return responseStream
                .doFinally(signal -> {
                    long durationMs = (System.nanoTime() - startTime) / 1_000_000;
                    profiler.recordAdvisorExecution(name + "_stream", durationMs);
                });
        }
        
        @Override
        public int getOrder() {
            return delegate.getOrder();
        }
    }
}

6.3 瓶颈诊断与优化建议

java 复制代码

import java.util.*;
import java.util.stream.Collectors;

/**
 * Advisor 链瓶颈分析
 */
@Component
public class AdvisorBottleneckAnalyzer {
    
    /**
     * 分析 Advisor 链的性能瓶颈
     */
    public BottleneckReport analyzeChain(List<AdvisorMetrics> metrics) {
        
        // 计算总耗时
        long totalTime = metrics.stream()
            .mapToLong(AdvisorMetrics::getDurationMs)
            .sum();
        
        // 计算各 Advisor 占比
        List<AdvisorAnalysis> analysisList = metrics.stream()
            .map(m -> {
                double percentage = (m.getDurationMs() * 100.0) / totalTime;
                return new AdvisorAnalysis(
                    m.getName(),
                    m.getDurationMs(),
                    percentage,
                    getSuggestion(m.getName(), percentage)
                );
            })
            .sorted(Comparator.comparingDouble(AdvisorAnalysis::getPercentage).reversed())
            .collect(Collectors.toList());
        
        // 识别瓶颈
        List<String> bottlenecks = analysisList.stream()
            .filter(a -> a.getPercentage() > 30)
            .map(AdvisorAnalysis::getName)
            .collect(Collectors.toList());
        
        return new BottleneckReport(analysisList, totalTime, bottlenecks);
    }
    
    private String getSuggestion(String advisorName, double percentage) {
        return switch (advisorName.toLowerCase()) {
            case "questionansweradvisor" -> 
                "建议优化向量检索：检查索引构建、使用缓存、调整 topK";
            case "toolcalladvisor" -> 
                "建议：减少工具定义数量、优化工具执行逻辑";
            case "chatmemoryadvisor" -> 
                "建议：使用 Redis 分布式缓存、减少历史窗口";
            case "chatmodel" -> 
                "建议：考虑使用更快的模型、启用流式响应";
            default -> "建议持续监控，关注性能变化趋势";
        };
    }
    
    static class AdvisorMetrics {
        private String name;
        private long durationMs;
        
        public String getName() { return name; }
        public long getDurationMs() { return durationMs; }
    }
    
    static class AdvisorAnalysis {
        private String name;
        private long durationMs;
        private double percentage;
        private String suggestion;
        
        public AdvisorAnalysis(String name, long durationMs, 
                              double percentage, String suggestion) {
            this.name = name;
            this.durationMs = durationMs;
            this.percentage = percentage;
            this.suggestion = suggestion;
        }
        
        public String getName() { return name; }
        public long getDurationMs() { return durationMs; }
        public double getPercentage() { return percentage; }
    }
    
    static class BottleneckReport {
        private List<AdvisorAnalysis> analysis;
        private long totalTimeMs;
        private List<String> bottlenecks;
        
        public BottleneckReport(List<AdvisorAnalysis> analysis, 
                               long totalTimeMs, List<String> bottlenecks) {
            this.analysis = analysis;
            this.totalTimeMs = totalTimeMs;
            this.bottlenecks = bottlenecks;
        }
        
        // getters...
    }
}

七、EVAL 框架：基于 LLM 的响应质量评估

7.1 EVAL 框架概述

Spring AI 2.0 提供了 EVAL 框架，用于自动评估 AI 响应的质量：

复制代码

+------------------------------------------------------------------+
|                      EVAL 评估框架                                 |
+------------------------------------------------------------------+
|                                                                   |
|  +------------+      +-------------+      +---------------+     |
|  | 用户原始   |      |   LLM       |      |    评估       |     |
|  | Prompt     |----->|   响应      |----->|    结果       |     |
|  +------------+      +-------------+      +---------------+     |
|        |                  |                      |              |
|        v                  v                      v              |
|  +------------+      +-------------+      +---------------+     |
|  | 参考答案   |      |  评分标准   |      |  评估报告     |     |
|  | (可选)     |      |  (Criteria) |      |  (分数+理由)  |     |
|  +------------+      +-------------+      +---------------+     |
|                                                                   |
+------------------------------------------------------------------+

7.2 响应质量评估实现

java 复制代码

import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.chat.prompt.UserPromptTemplate;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;

import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;

/**
 * AI 响应质量评估器
 */
@Component
public class ResponseQualityEvaluator {
    
    private final ChatModel chatModel;
    
    public ResponseQualityEvaluator(ChatModel chatModel) {
        this.chatModel = chatModel;
    }
    
    /**
     * 综合质量评估
     */
    public EvaluationResult evaluate(String question, String response, 
                                     String referenceAnswer) {
        
        // 1. 准确性评估
        double accuracyScore = evaluateAccuracy(question, response, referenceAnswer);
        
        // 2. 完整性评估
        double completenessScore = evaluateCompleteness(question, response);
        
        // 3. 幻觉检测
        HallucinationResult hallucination = detectHallucination(question, response, referenceAnswer);
        
        // 4. 毒性检测
        double toxicityScore = evaluateToxicity(response);
        
        // 综合评分
        double overallScore = (accuracyScore * 0.4 + 
                               completenessScore * 0.3 + 
                               (hallucination.isPassed() ? 1.0 : 0.0) * 0.2 +
                               toxicityScore * 0.1);
        
        return new EvaluationResult(
            accuracyScore,
            completenessScore,
            hallucination,
            toxicityScore,
            overallScore,
            overallScore >= 0.7 ? "PASS" : "FAIL"
        );
    }
    
    /**
     * 准确性评估
     */
    private double evaluateAccuracy(String question, String response, 
                                    String referenceAnswer) {
        
        String evaluationPrompt = String.format("""
            你是一个专业的 AI 评估专家。请评估以下 AI 回答的准确性。
            
            问题：%s
            
            AI 回答：%s
            
            参考答案：%s
            
            请从以下维度进行评分：
            1. 答案是否正确回答了问题？
            2. 答案与参考答案的一致性如何？
            3. 是否有事实性错误？
            
            请返回 JSON 格式的评分结果：
            {
                "score": 0.0-1.0,
                "reason": "评分理由",
                "errors": ["错误1", "错误2"]
            }
            """, question, response, referenceAnswer);
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(evaluationPrompt))
            );
            return parseScore(chatResponse.getResult().getOutput().getText());
        } catch (Exception e) {
            return 0.5; // 默认中等分数
        }
    }
    
    /**
     * 完整性评估
     */
    private double evaluateCompleteness(String question, String response) {
        
        String evaluationPrompt = String.format("""
            请评估以下回答的完整性。
            
            问题：%s
            
            回答：%s
            
            评估维度：
            1. 是否覆盖了问题的所有方面？
            2. 解释是否足够详细？
            3. 是否有遗漏的重要信息？
            
            返回 JSON：
            {
                "score": 0.0-1.0,
                "missing_aspects": ["缺失方面1", "缺失方面2"],
                "reason": "评分理由"
            }
            """, question, response);
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(evaluationPrompt))
            );
            return parseScore(chatResponse.getResult().getOutput().getText());
        } catch (Exception e) {
            return 0.5;
        }
    }
    
    /**
     * 幻觉检测
     */
    private HallucinationResult detectHallucination(String question, String response,
                                                    String referenceAnswer) {
        
        String prompt = String.format("""
            你是一个事实核查专家。请检测以下回答中是否存在幻觉（虚构的、不可验证的信息）。
            
            问题：%s
            
            回答：%s
            
            参考信息：%s
            
            幻觉类型：
            1. 事实性错误 - 与已知事实不符
            2. 不可验证 - 无法确认真假
            3. 过度推断 - 从前提推导出不必然的结论
            4. 虚构引用 - 不存在的引用或来源
            
            返回 JSON：
            {
                "is_hallucination": true/false,
                "hallucination_type": "factual/unverifiable/over_claimed/fabricated/null",
                "confidence": 0.0-1.0,
                "problematic_sentences": ["问题句子1", "问题句子2"],
                "explanation": "解释说明"
            }
            """, question, response, referenceAnswer != null ? referenceAnswer : "无参考");
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(prompt))
            );
            return parseHallucinationResult(chatResponse.getResult().getOutput().getText());
        } catch (Exception e) {
            return new HallucinationResult(false, "error", 0.0, 
                List.of(), "评估失败，使用默认结果");
        }
    }
    
    /**
     * 毒性检测
     */
    private double evaluateToxicity(String response) {
        
        // 简单的关键词检测 + LLM 评估
        // 实际生产应使用专门的毒性检测模型
        
        List<String> toxicPatterns = List.of(
            "暴力", "仇恨", "歧视", "违法", "犯罪",
            "赌博", "毒品", "色情", "自杀", "自残"
        );
        
        boolean containsToxic = toxicPatterns.stream()
            .anyMatch(p -> response.contains(p));
        
        if (containsToxic) {
            return 0.0;
        }
        
        // 调用 LLM 进行细粒度检测
        String toxicityPrompt = String.format("""
            请评估以下内容是否包含毒性信息。
            
            内容：%s
            
            返回 JSON：
            {
                "is_toxic": true/false,
                "toxicity_level": 0.0-1.0,
                "toxic_categories": ["类别1"],
                "reason": "理由"
            }
            """, response);
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(toxicityPrompt))
            );
            return 1.0 - parseToxicity(chatResponse.getResult().getOutput().getText());
        } catch (Exception e) {
            return 1.0;
        }
    }
    
    private double parseScore(String json) {
        // 简化解析，实际应使用 Jackson
        try {
            var pattern = Pattern.compile("\"score\"\\s*:\\s*([0-9.]+)");
            var matcher = pattern.matcher(json);
            if (matcher.find()) {
                return Double.parseDouble(matcher.group(1));
            }
        } catch (Exception e) {
            // ignore
        }
        return 0.5;
    }
    
    private HallucinationResult parseHallucinationResult(String json) {
        try {
            boolean isHallucination = json.contains("\"is_hallucination\" : true") || 
                                     json.contains("\"is_hallucination\": true");
            return new HallucinationResult(
                isHallucination,
                "factual",
                0.8,
                List.of(),
                "parsed from response"
            );
        } catch (Exception e) {
            return new HallucinationResult(false, "unknown", 0.0, List.of(), "parse error");
        }
    }
    
    private double parseToxicity(String json) {
        try {
            var pattern = Pattern.compile("\"toxicity_level\"\\s*:\\s*([0-9.]+)");
            var matcher = pattern.matcher(json);
            if (matcher.find()) {
                return Double.parseDouble(matcher.group(1));
            }
        } catch (Exception e) {
            // ignore
        }
        return 0.0;
    }
    
    // 内部类定义
    public static class EvaluationResult {
        private final double accuracyScore;
        private final double completenessScore;
        private final HallucinationResult hallucination;
        private final double toxicityScore;
        private final double overallScore;
        private final String passStatus;
        
        public EvaluationResult(double accuracyScore, double completenessScore,
                               HallucinationResult hallucination, double toxicityScore,
                               double overallScore, String passStatus) {
            this.accuracyScore = accuracyScore;
            this.completenessScore = completenessScore;
            this.hallucination = hallucination;
            this.toxicityScore = toxicityScore;
            this.overallScore = overallScore;
            this.passStatus = passStatus;
        }
        
        // getters
        public double getAccuracyScore() { return accuracyScore; }
        public double getCompletenessScore() { return completenessScore; }
        public HallucinationResult getHallucination() { return hallucination; }
        public double getToxicityScore() { return toxicityScore; }
        public double getOverallScore() { return overallScore; }
        public String getPassStatus() { return passStatus; }
    }
    
    public static class HallucinationResult {
        private final boolean passed;
        private final String type;
        private final double confidence;
        private final List<String> problematicSentences;
        private final String explanation;
        
        public HallucinationResult(boolean passed, String type, double confidence,
                                   List<String> problematicSentences, String explanation) {
            this.passed = passed;
            this.type = type;
            this.confidence = confidence;
            this.problematicSentences = problematicSentences;
            this.explanation = explanation;
        }
        
        public boolean isPassed() { return passed; }
        public String getType() { return type; }
        public double getConfidence() { return confidence; }
        public List<String> getProblematicSentences() { return problematicSentences; }
        public String getExplanation() { return explanation; }
    }
}

7.3 实战：构建企业级幻觉检测系统

java 复制代码

import org.springframework.stereotype.Component;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.messages.AssistantMessage;

import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.regex.Pattern;

/**
 * 企业级幻觉检测系统
 */
@Component
public class EnterpriseHallucinationDetector {
    
    private final ChatModel chatModel;
    private final VectorStore referenceStore;
    private final Map<String, ClaimCache> claimCache = new ConcurrentHashMap<>();
    
    // 事实库：用于快速验证常见事实
    private final Map<String, Boolean> factualDatabase = new ConcurrentHashMap<>();
    
    public EnterpriseHallucinationDetector(ChatModel chatModel, 
                                          VectorStore referenceStore) {
        this.chatModel = chatModel;
        this.referenceStore = referenceStore;
        initializeFactualDatabase();
    }
    
    /**
     * 多层幻觉检测
     */
    public HallucinationReport detect(String question, String response) {
        
        List<HallucinationIssue> issues = new ArrayList<>();
        
        // 第一层：规则检测（快速）
        List<HallucinationIssue> ruleIssues = ruleBasedDetection(response);
        issues.addAll(ruleIssues);
        
        // 第二层：事实库验证
        List<HallucinationIssue> factIssues = factualVerification(response);
        issues.addAll(factIssues);
        
        // 第三层：语义检索验证
        if (referenceStore != null) {
            List<HallucinationIssue> semanticIssues = semanticVerification(question, response);
            issues.addAll(semanticIssues);
        }
        
        // 第四层：LLM 深度分析
        List<HallucinationIssue> llmIssues = llmDeepAnalysis(question, response);
        issues.addAll(llmIssues);
        
        // 计算综合置信度
        double confidence = calculateConfidence(issues);
        
        return new HallucinationReport(
            issues.isEmpty(),
            confidence,
            issues,
            response.length()
        );
    }
    
    /**
     * 第一层：规则检测
     */
    private List<HallucinationIssue> ruleBasedDetection(String response) {
        List<HallucinationIssue> issues = new ArrayList<>();
        
        // 检测绝对性表述
        Pattern absolutePattern = Pattern.compile(
            "(绝对|肯定|必然|100%|所有|永远|从不|不可否认|毫无疑问)"
        );
        
        if (absolutePattern.matcher(response).find()) {
            issues.add(new HallucinationIssue(
                "absolute_statement",
                "检测到绝对性表述，可能存在过度推断",
                0.3,
                "warning"
            ));
        }
        
        // 检测虚构引用
        Pattern citationPattern = Pattern.compile(
            "(《[^》]+》|出自|根据|研究表明|数据显示|据[^\\s]+报道)"
        );
        
        if (citationPattern.matcher(response).find()) {
            // 需要进一步验证引用真实性
            issues.add(new HallucinationIssue(
                "citation_found",
                "检测到引用表述，需验证真实性",
                0.2,
                "info"
            ));
        }
        
        // 检测模糊表述
        Pattern vaguePattern = Pattern.compile(
            "(可能|也许|大概|似乎|据说|据说|有人|某些人)"
        );
        
        if (vaguePattern.matcher(response).find()) {
            issues.add(new HallucinationIssue(
                "vague_statement",
                "检测到模糊表述，信息来源不明确",
                0.2,
                "info"
            ));
        }
        
        return issues;
    }
    
    /**
     * 第二层：事实库验证
     */
    private List<HallucinationIssue> factualVerification(String response) {
        List<HallucinationIssue> issues = new ArrayList<>();
        
        // 提取关键claims并验证
        List<String> claims = extractClaims(response);
        
        for (String claim : claims) {
            if (factualDatabase.containsKey(claim)) {
                boolean isFact = factualDatabase.get(claim);
                if (!isFact) {
                    issues.add(new HallucinationIssue(
                        "fact_check_failed",
                        "声称 '" + claim + "' 与已知事实不符",
                        0.9,
                        "error"
                    ));
                }
            }
        }
        
        return issues;
    }
    
    /**
     * 第三层：语义检索验证
     */
    private List<HallucinationIssue> semanticVerification(String question, String response) {
        List<HallucinationIssue> issues = new ArrayList<>();
        
        try {
            // 检索相关文档
            List<Document> relevantDocs = referenceStore.similaritySearch(
                SearchRequest.query(response).withTopK(3)
            );
            
            if (relevantDocs.isEmpty()) {
                issues.add(new HallucinationIssue(
                    "no_reference_found",
                    "未找到相关参考文档支持该回答",
                    0.5,
                    "warning"
                ));
            } else {
                // 计算支持度
                double support = calculateSupport(response, relevantDocs);
                if (support < 0.3) {
                    issues.add(new HallucinationIssue(
                        "low_support",
                        String.format("参考文档支持度仅为 %.0f%%", support * 100),
                        0.7,
                        "warning"
                    ));
                }
            }
        } catch (Exception e) {
            // 检索失败，记录日志
        }
        
        return issues;
    }
    
    /**
     * 第四层：LLM 深度分析
     */
    private List<HallucinationIssue> llmDeepAnalysis(String question, String response) {
        List<HallucinationIssue> issues = new ArrayList<>();
        
        String analysisPrompt = String.format("""
            你是一个专业的幻觉检测专家。请分析以下回答是否存在幻觉。
            
            原始问题：%s
            
            回答内容：%s
            
            请识别：
            1. 事实性错误（与客观事实不符）
            2. 不可验证的声称（无法确认真假）
            3. 过度推断（从给定前提得出不必然的结论）
            4. 内部矛盾（回答内部逻辑不一致）
            
            返回严格的 JSON 格式：
            {
                "has_hallucination": true/false,
                "issues": [
                    {
                        "type": "factual_error/unverifiable/over_claim/contradiction",
                        "description": "问题描述",
                        "evidence": "证据或原文引用",
                        "severity": "high/medium/low"
                    }
                ],
                "overall_confidence": 0.0-1.0,
                "summary": "一句话总结"
            }
            """, question, response);
        
        try {
            ChatResponse chatResponse = chatModel.call(
                new Prompt(new UserMessage(analysisPrompt))
            );
            
            // 解析LLM返回的JSON并转换为issues
            // 简化处理
            String result = chatResponse.getResult().getOutput().getText();
            
            if (result.contains("\"has_hallucination\" : true") || 
                result.contains("\"has_hallucination\": true")) {
                issues.add(new HallucinationIssue(
                    "llm_detected",
                    "LLM 检测到潜在幻觉",
                    0.6,
                    "warning"
                ));
            }
            
        } catch (Exception e) {
            // LLM 调用失败
        }
        
        return issues;
    }
    
    private List<String> extractClaims(String text) {
        // 简化：按句子提取
        List<String> claims = new ArrayList<>();
        String[] sentences = text.split("[。！？]");
        
        for (String sentence : sentences) {
            if (sentence.length() > 10 && sentence.length() < 100) {
                claims.add(sentence.trim());
            }
        }
        
        return claims;
    }
    
    private double calculateSupport(String response, List<Document> documents) {
        // 简化：基于关键词重叠度
        Set<String> responseWords = new HashSet<>(Arrays.asList(response.split("\\s+")));
        double totalOverlap = 0;
        
        for (Document doc : documents) {
            Set<String> docWords = new HashSet<>(Arrays.asList(doc.getText().split("\\s+")));
            Set<String> overlap = new HashSet<>(responseWords);
            overlap.retainAll(docWords);
            totalOverlap += (double) overlap.size() / responseWords.size();
        }
        
        return totalOverlap / documents.size();
    }
    
    private double calculateConfidence(List<HallucinationIssue> issues) {
        if (issues.isEmpty()) {
            return 0.0;
        }
        
        double total = issues.stream()
            .mapToDouble(i -> i.severity.equals("error") ? 1.0 : 
                        i.severity.equals("warning") ? 0.6 : 0.3)
            .sum();
        
        return Math.min(total, 1.0);
    }
    
    private void initializeFactualDatabase() {
        // 初始化常见事实（实际生产应从数据库加载）
        factualDatabase.put("太阳从东方升起", true);
        factualDatabase.put("水的化学式是H2O", true);
        factualDatabase.put("地球围绕太阳转", true);
    }
    
    // 内部类
    public static class HallucinationReport {
        private final boolean passed;
        private final double confidence;
        private final List<HallucinationIssue> issues;
        private final int responseLength;
        
        public HallucinationReport(boolean passed, double confidence, 
                                   List<HallucinationIssue> issues, int responseLength) {
            this.passed = passed;
            this.confidence = confidence;
            this.issues = issues;
            this.responseLength = responseLength;
        }
        
        public boolean isPassed() { return passed; }
        public double getConfidence() { return confidence; }
        public List<HallucinationIssue> getIssues() { return issues; }
    }
    
    public static class HallucinationIssue {
        private final String type;
        private final String description;
        private final double severity;
        private final String severityLevel;
        
        public HallucinationIssue(String type, String description, 
                                  double severity, String severityLevel) {
            this.type = type;
            this.description = description;
            this.severity = severity;
            this.severityLevel = severityLevel;
        }
        
        public String getType() { return type; }
        public String getDescription() { return description; }
        public double getSeverity() { return severity; }
    }
}

7.4 EVAL 集成到 Advisor 链

java 复制代码

import org.springframework.ai.chat.client.advisor.Advisor;
import org.springframework.ai.chat.model.ChatContext;
import org.springframework.ai.chat.model.ChatResponse;

/**
 * EVAL 评估 Advisor
 */
@Component
public class EvaluationAdvisor implements Advisor {
    
    private final ResponseQualityEvaluator evaluator;
    private final EnterpriseHallucinationDetector hallucinationDetector;
    private final double qualityThreshold;
    
    public EvaluationAdvisor(ResponseQualityEvaluator evaluator,
                             EnterpriseHallucinationDetector hallucinationDetector,
                             double qualityThreshold) {
        this.evaluator = evaluator;
        this.hallucinationDetector = hallucinationDetector;
        this.qualityThreshold = qualityThreshold;
    }
    
    @Override
    public ChatResponse advise(ChatContext context, ChatResponse response) {
        
        // 获取用户问题
        String question = extractQuestion(context);
        String answer = extractAnswer(response);
        
        // 1. 幻觉检测
        EnterpriseHallucinationDetector.HallucinationReport hallucinationReport = 
            hallucinationDetector.detect(question, answer);
        
        if (!hallucinationReport.isPassed()) {
            // 记录警告
            System.out.println("HALLUCINATION DETECTED: " + 
                hallucinationReport.getConfidence());
            
            // 决策：拒绝/警告/重新生成
            if (hallucinationReport.getConfidence() > 0.8) {
                // 高置信度幻觉，标记响应
                response.getResult().getOutput().getText();
                // 可以添加元数据标记
            }
        }
        
        // 2. 质量评估（可选，异步执行）
        // 实际生产应异步处理，避免影响响应延迟
        
        return response;
    }
    
    private String extractQuestion(ChatContext context) {
        if (context != null && !context.getMessages().isEmpty()) {
            return context.getMessages().get(0).getText();
        }
        return "";
    }
    
    private String extractAnswer(ChatResponse response) {
        return response.getResult().getOutput().getText();
    }
    
    @Override
    public int getOrder() {
        return Integer.MAX_VALUE; // 最后执行
    }
}

八、完整集成示例

8.1 一站式可观测性配置

java 复制代码

import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * Spring AI 可观测性自动配置
 */
@Configuration
public class AIObservabilityConfiguration {
    
    @Bean
    @ConfigurationProperties(prefix = "ai.observability")
    public AIObservabilityProperties observabilityProperties() {
        return new AIObservabilityProperties();
    }
    
    @Bean
    public AIMetricsCollector aiMetricsCollector(
            MeterRegistry registry,
            AIObservabilityProperties properties) {
        return new AIMetricsCollector(registry, properties);
    }
    
    @Bean
    public AdvisorChainProfiler advisorChainProfiler(MeterRegistry registry) {
        return new AdvisorChainProfiler(registry);
    }
    
    @Bean
    public ResponseQualityEvaluator responseQualityEvaluator(ChatModel chatModel) {
        return new ResponseQualityEvaluator(chatModel);
    }
    
    @Bean
    public EnterpriseHallucinationDetector enterpriseHallucinationDetector(
            ChatModel chatModel,
            VectorStore vectorStore) {
        return new EnterpriseHallucinationDetector(chatModel, vectorStore);
    }
    
    @Bean
    public EvaluationAdvisor evaluationAdvisor(
            ResponseQualityEvaluator evaluator,
            EnterpriseHallucinationDetector detector,
            AIObservabilityProperties properties) {
        return new EvaluationAdvisor(evaluator, detector, 
            properties.getQualityThreshold());
    }
    
    // 配置属性类
    static class AIObservabilityProperties {
        private boolean enabled = true;
        private double qualityThreshold = 0.7;
        private boolean enableEvaluation = false;
        private boolean enableHallucinationDetection = true;
        
        // getters and setters
        public boolean isEnabled() { return enabled; }
        public void setEnabled(boolean enabled) { this.enabled = enabled; }
        public double getQualityThreshold() { return qualityThreshold; }
        public void setQualityThreshold(double qualityThreshold) { 
            this.qualityThreshold = qualityThreshold; 
        }
        public boolean isEnableEvaluation() { return enableEvaluation; }
        public void setEnableEvaluation(boolean enableEvaluation) { 
            this.enableEvaluation = enableEvaluation; 
        }
        public boolean isEnableHallucinationDetection() { 
            return enableHallucinationDetection; 
        }
        public void setEnableHallucinationDetection(boolean enableHallucinationDetection) {
            this.enableHallucinationDetection = enableHallucinationDetection;
        }
    }
}

8.2 application.yml 完整配置

yaml 复制代码

spring:
  application:
    name: ai-chat-service
  
  ai:
    observability:
      enabled: true
      quality-threshold: 0.7
      enable-evaluation: true
      enable-hallucination-detection: true
    
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
    
    vectorstore:
      redis:
        url: redis://localhost:6379
        index-name: ai-docs

# 监控配置
management:
  endpoints:
    web:
      exposure:
        include: prometheus,metrics,health,info
  metrics:
    tags:
      application: ${spring.application.name}
    export:
      prometheus:
        enabled: true
  observations:
    enabled: true
  tracing:
    sampling:
      probability: 0.3

# Zipkin 配置
spring.zipkin:
  base-url: http://localhost:9411
  sender:
    type: web

# 自定义指标配置
ai:
  observability:
    metrics:
      latency-percentiles: 0.5,0.95,0.99
      token-price:
        gpt-4o-input: 5.0
        gpt-4o-output: 15.0
        gpt-4o-mini-input: 0.15
        gpt-4o-mini-output: 0.6

九、总结与最佳实践

9.1 可观测性实施路线图

复制代码

阶段一：基础监控（1-2周）
├── 集成 Micrometer + Prometheus
├── 采集基础延迟、错误指标
└── 配置 Grafana 看板

阶段二：成本控制（2-3周）
├── 接入 Token 计数
├── 实现成本计算
└── 配置预算告警

阶段三：链路追踪（2-3周）
├── 集成 Zipkin/Jaeger
├── 自定义业务标签
└── Advisor 链追踪

阶段四：质量保障（3-4周）
├── 实现 EVAL 评估框架
├── 部署幻觉检测系统
└── 建立质量门禁

9.2 关键指标告警规则

指标	告警阈值	说明
ai.call.latency.p99	> 5000ms	LLM 调用超时
ai.call.errors.rate	> 5%	调用失败率过高
ai.cost.hourly	> $100	小时成本超预算
advisor.execution.time	> 1000ms	单个 Advisor 慢
hallucination.confidence	> 0.8	高置信度幻觉

9.3 架构师视角的思考

企业级 AI 应用的可观测性建设，本质上是在解决三个核心问题：

可观测性 -> 可控性：通过监控延迟、错误率、成本，确保 AI 服务在可控范围内运行
可观测性 -> 质量保障：通过 EVAL 框架和幻觉检测，把控 AI 输出的质量底线
可观测性 -> 持续优化：通过 Advisor 链剖析，识别性能瓶颈，指导架构优化

Spring AI 2.0 的可观测性体系与 Spring 生态深度集成，对于已有 Spring Boot 技术栈的团队来说，是构建企业级 AI 应用的最佳选择。

记住：AI 应用的可观测性不是锦上添花，而是生产落地的必备基础设施。在追求 AI 能力的同时，必须同步建设与之匹配的可观测性能力。

参考资源：

Spring AI 官方文档：https://docs.spring.io/spring-ai/reference/
Micrometer 官方文档：https://micrometer.io/docs
JTokkit GitHub：https://github.com/knuddelsgmbh/jtokkit
Grafana 官方模板库：https://grafana.com/grafana/dashboards/