聊聊Spring AI的Evaluator

本文主要研究一下Spring AI的Evaluator

Evaluator

spring-ai-client-chat/src/main/java/org/springframework/ai/evaluation/Evaluator.java

scss 复制代码
@FunctionalInterface
public interface Evaluator {

	EvaluationResponse evaluate(EvaluationRequest evaluationRequest);

	default String doGetSupportingData(EvaluationRequest evaluationRequest) {
		List<Document> data = evaluationRequest.getDataList();
		return data.stream()
			.map(Document::getText)
			.filter(StringUtils::hasText)
			.collect(Collectors.joining(System.lineSeparator()));
	}

}

Evaluator接口定义了evaluate方法,用于对ai生成的内容进行评估,避免AI没有产生幻觉式的响应,它有两个实现,分别是RelevancyEvaluator、FactCheckingEvaluator

EvaluationRequest

org/springframework/ai/evaluation/EvaluationRequest.java

arduino 复制代码
public class EvaluationRequest {

	private final String userText;

	private final List<Document> dataList;

	private final String responseContent;

	public EvaluationRequest(String userText, String responseContent) {
		this(userText, Collections.emptyList(), responseContent);
	}

	public EvaluationRequest(List<Document> dataList, String responseContent) {
		this("", dataList, responseContent);
	}

	public EvaluationRequest(String userText, List<Document> dataList, String responseContent) {
		this.userText = userText;
		this.dataList = dataList;
		this.responseContent = responseContent;
	}

	//......
}	

EvaluationRequest定义了userText、dataList、responseContent属性,其中userText是用户的输入,dataList是上下文数据,比如RAG追加的内容,responseContent是AI模型的响应

EvaluationResponse

org/springframework/ai/evaluation/EvaluationResponse.java

arduino 复制代码
public class EvaluationResponse {

	private final boolean pass;

	private final float score;

	private final String feedback;

	private final Map<String, Object> metadata;

	@Deprecated
	public EvaluationResponse(boolean pass, float score, String feedback, Map<String, Object> metadata) {
		this.pass = pass;
		this.score = score;
		this.feedback = feedback;
		this.metadata = metadata;
	}

	public EvaluationResponse(boolean pass, String feedback, Map<String, Object> metadata) {
		this.pass = pass;
		this.score = 0;
		this.feedback = feedback;
		this.metadata = metadata;
	}

	//......
}	

EvaluationResponse定义了pass、score、feedback、metadata属性

RelevancyEvaluator

org/springframework/ai/evaluation/RelevancyEvaluator.java

ini 复制代码
public class RelevancyEvaluator implements Evaluator {

	private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
				Your task is to evaluate if the response for the query
				is in line with the context information provided.\\n
				You have two options to answer. Either YES/ NO.\\n
				Answer - YES, if the response for the query
				is in line with context information otherwise NO.\\n
				Query: \\n {query}\\n
				Response: \\n {response}\\n
				Context: \\n {context}\\n
				Answer: "
			""";

	private final ChatClient.Builder chatClientBuilder;

	public RelevancyEvaluator(ChatClient.Builder chatClientBuilder) {
		this.chatClientBuilder = chatClientBuilder;
	}

	@Override
	public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {

		var response = evaluationRequest.getResponseContent();
		var context = doGetSupportingData(evaluationRequest);

		String evaluationResponse = this.chatClientBuilder.build()
			.prompt()
			.user(userSpec -> userSpec.text(DEFAULT_EVALUATION_PROMPT_TEXT)
				.param("query", evaluationRequest.getUserText())
				.param("response", response)
				.param("context", context))
			.call()
			.content();

		boolean passing = false;
		float score = 0;
		if (evaluationResponse.toLowerCase().contains("yes")) {
			passing = true;
			score = 1;
		}

		return new EvaluationResponse(passing, score, "", Collections.emptyMap());
	}

}

RelevancyEvaluator让AI去评估响应是否与上下文信息一致,给出yes或者no的结果,如果是yes则passing为true,score为1,否则默认passing为false,score为0

示例

scss 复制代码
@Test
void testEvaluation() {

    dataController.delete();
    dataController.load();

    String userText = "What is the purpose of Carina?";

    ChatResponse response = ChatClient.builder(chatModel)
            .build().prompt()
            .advisors(new QuestionAnswerAdvisor(vectorStore))
            .user(userText)
            .call()
            .chatResponse();
    String responseContent = response.getResult().getOutput().getContent();

    var relevancyEvaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));

    EvaluationRequest evaluationRequest = new EvaluationRequest(userText,
            (List<Content>) response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS), responseContent);

    EvaluationResponse evaluationResponse = relevancyEvaluator.evaluate(evaluationRequest);

    assertTrue(evaluationResponse.isPass(), "Response is not relevant to the question");

}

这里先用userText去问下AI,然后将responseContent、QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS一起丢给relevancyEvaluator,再用AI去评估一下

FactCheckingEvaluator

org/springframework/ai/evaluation/FactCheckingEvaluator.java

typescript 复制代码
public class FactCheckingEvaluator implements Evaluator {

	private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
				Evaluate whether or not the following claim is supported by the provided document.
				Respond with "yes" if the claim is supported, or "no" if it is not.
				Document: \\n {document}\\n
				Claim: \\n {claim}
			""";

	private static final String BESPOKE_EVALUATION_PROMPT_TEXT = """
				Document: \\n {document}\\n
				Claim: \\n {claim}
			""";

	private final ChatClient.Builder chatClientBuilder;

	private final String evaluationPrompt;

	/**
	 * Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder. Uses
	 * the default evaluation prompt suitable for general purpose LLMs.
	 * @param chatClientBuilder The builder for the ChatClient used to perform the
	 * evaluation
	 */
	public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder) {
		this(chatClientBuilder, DEFAULT_EVALUATION_PROMPT_TEXT);
	}

	/**
	 * Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder and
	 * evaluation prompt.
	 * @param chatClientBuilder The builder for the ChatClient used to perform the
	 * evaluation
	 * @param evaluationPrompt The prompt text to use for evaluation
	 */
	public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder, String evaluationPrompt) {
		this.chatClientBuilder = chatClientBuilder;
		this.evaluationPrompt = evaluationPrompt;
	}

	/**
	 * Creates a FactCheckingEvaluator configured for use with the Bespoke Minicheck
	 * model.
	 * @param chatClientBuilder The builder for the ChatClient used to perform the
	 * evaluation
	 * @return A FactCheckingEvaluator configured for Bespoke Minicheck
	 */
	public static FactCheckingEvaluator forBespokeMinicheck(ChatClient.Builder chatClientBuilder) {
		return new FactCheckingEvaluator(chatClientBuilder, BESPOKE_EVALUATION_PROMPT_TEXT);
	}

	/**
	 * Evaluates whether the response content in the EvaluationRequest is factually
	 * supported by the context provided in the same request.
	 * @param evaluationRequest The request containing the response to be evaluated and
	 * the supporting context
	 * @return An EvaluationResponse indicating whether the claim is supported by the
	 * document
	 */
	@Override
	public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {
		var response = evaluationRequest.getResponseContent();
		var context = doGetSupportingData(evaluationRequest);

		String evaluationResponse = this.chatClientBuilder.build()
			.prompt()
			.user(userSpec -> userSpec.text(this.evaluationPrompt).param("document", context).param("claim", response))
			.call()
			.content();

		boolean passing = evaluationResponse.equalsIgnoreCase("yes");
		return new EvaluationResponse(passing, "", Collections.emptyMap());
	}

}

FactCheckingEvaluator旨在评估AI生成的响应在给定上下文中的事实准确性。该评估器通过验证给定的声明(claim)是否逻辑上支持提供的上下文(document),帮助检测和减少AI输出中的幻觉现象;在使用FactCheckingEvaluator时,claim和document会被提交给AI模型进行评估。为了更高效地完成这一任务,可以使用更小且更高效的AI模型,例如Bespoke的Minicheck。Minicheck 是一种专门设计用于事实核查的小型高效模型,它通过分析事实信息片段和生成的输出,验证声明是否与文档相符。如果文档能够证实声明的真实性,模型将回答"是",否则回答"否"。这种模型特别适用于检索增强型生成(RAG)应用,确保生成的答案基于上下文信息。

示例

ini 复制代码
@Test
void testFactChecking() {
  // Set up the Ollama API
  OllamaApi ollamaApi = new OllamaApi("http://localhost:11434");

  ChatModel chatModel = new OllamaChatModel(ollamaApi,
				OllamaOptions.builder().model(BESPOKE_MINICHECK).numPredict(2).temperature(0.0d).build())


  // Create the FactCheckingEvaluator
  var factCheckingEvaluator = new FactCheckingEvaluator(ChatClient.builder(chatModel));

  // Example context and claim
  String context = "The Earth is the third planet from the Sun and the only astronomical object known to harbor life.";
  String claim = "The Earth is the fourth planet from the Sun.";

  // Create an EvaluationRequest
  EvaluationRequest evaluationRequest = new EvaluationRequest(context, Collections.emptyList(), claim);

  // Perform the evaluation
  EvaluationResponse evaluationResponse = factCheckingEvaluator.evaluate(evaluationRequest);

  assertFalse(evaluationResponse.isPass(), "The claim should not be supported by the context");

}

这里使用ollama调用bespoke-minicheck模型,其temperature设置为0.0,之后把context与claim都传递给factCheckingEvaluator去评估

小结

Spring AI提供了Evaluator接口定义了evaluate方法,用于对ai生成的内容进行评估,避免AI没有产生幻觉式的响应,它有两个实现,分别是RelevancyEvaluator、FactCheckingEvaluator。RelevancyEvaluator用于评估相关性,FactCheckingEvaluator用于评估事实准确性。

doc

相关推荐
Mintopia11 小时前
🤖 2025 年的人类还需要 “Prompt 工程师” 吗?
人工智能·llm·aigc
Mintopia11 小时前
意图驱动编程(Intent-Driven Programming)
人工智能·llm·aigc
想用offer打牌12 小时前
逃出结构化思维:从向量,向量数据库到RAG
后端·架构·llm
想用offer打牌12 小时前
Reasoning + Acting: ReAct范式与ReAct Agent
人工智能·后端·llm
爱可生开源社区12 小时前
在数据库迁移中,如何让 AI 真正“可用、可信、可落地”?
数据库·sql·llm
Elwin Wong15 小时前
关于熵的一些概念及其计算
人工智能·大模型·llm
hzp6661 天前
新兴存储全景与未来架构走向
大数据·大模型·llm·aigc·数据存储
EdisonZhou1 天前
MAF快速入门(8)条件路由工作流
llm·aigc·agent·.net core
暴风鱼划水1 天前
大型语言模型(入门篇)B
人工智能·语言模型·大模型·llm
xhxxx1 天前
别再让 AI 自由发挥了!用 LangChain + Zod 强制它输出合法 JSON
前端·langchain·llm