序
本文主要研究一下Spring AI的Evaluator
Evaluator
spring-ai-client-chat/src/main/java/org/springframework/ai/evaluation/Evaluator.java
scss
@FunctionalInterface
public interface Evaluator {
EvaluationResponse evaluate(EvaluationRequest evaluationRequest);
default String doGetSupportingData(EvaluationRequest evaluationRequest) {
List<Document> data = evaluationRequest.getDataList();
return data.stream()
.map(Document::getText)
.filter(StringUtils::hasText)
.collect(Collectors.joining(System.lineSeparator()));
}
}
Evaluator接口定义了evaluate方法,用于对ai生成的内容进行评估,避免AI没有产生幻觉式的响应,它有两个实现,分别是RelevancyEvaluator、FactCheckingEvaluator
EvaluationRequest
org/springframework/ai/evaluation/EvaluationRequest.java
arduino
public class EvaluationRequest {
private final String userText;
private final List<Document> dataList;
private final String responseContent;
public EvaluationRequest(String userText, String responseContent) {
this(userText, Collections.emptyList(), responseContent);
}
public EvaluationRequest(List<Document> dataList, String responseContent) {
this("", dataList, responseContent);
}
public EvaluationRequest(String userText, List<Document> dataList, String responseContent) {
this.userText = userText;
this.dataList = dataList;
this.responseContent = responseContent;
}
//......
}
EvaluationRequest定义了userText、dataList、responseContent属性,其中userText是用户的输入,dataList是上下文数据,比如RAG追加的内容,responseContent是AI模型的响应
EvaluationResponse
org/springframework/ai/evaluation/EvaluationResponse.java
arduino
public class EvaluationResponse {
private final boolean pass;
private final float score;
private final String feedback;
private final Map<String, Object> metadata;
@Deprecated
public EvaluationResponse(boolean pass, float score, String feedback, Map<String, Object> metadata) {
this.pass = pass;
this.score = score;
this.feedback = feedback;
this.metadata = metadata;
}
public EvaluationResponse(boolean pass, String feedback, Map<String, Object> metadata) {
this.pass = pass;
this.score = 0;
this.feedback = feedback;
this.metadata = metadata;
}
//......
}
EvaluationResponse定义了pass、score、feedback、metadata属性
RelevancyEvaluator
org/springframework/ai/evaluation/RelevancyEvaluator.java
ini
public class RelevancyEvaluator implements Evaluator {
private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
Your task is to evaluate if the response for the query
is in line with the context information provided.\\n
You have two options to answer. Either YES/ NO.\\n
Answer - YES, if the response for the query
is in line with context information otherwise NO.\\n
Query: \\n {query}\\n
Response: \\n {response}\\n
Context: \\n {context}\\n
Answer: "
""";
private final ChatClient.Builder chatClientBuilder;
public RelevancyEvaluator(ChatClient.Builder chatClientBuilder) {
this.chatClientBuilder = chatClientBuilder;
}
@Override
public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {
var response = evaluationRequest.getResponseContent();
var context = doGetSupportingData(evaluationRequest);
String evaluationResponse = this.chatClientBuilder.build()
.prompt()
.user(userSpec -> userSpec.text(DEFAULT_EVALUATION_PROMPT_TEXT)
.param("query", evaluationRequest.getUserText())
.param("response", response)
.param("context", context))
.call()
.content();
boolean passing = false;
float score = 0;
if (evaluationResponse.toLowerCase().contains("yes")) {
passing = true;
score = 1;
}
return new EvaluationResponse(passing, score, "", Collections.emptyMap());
}
}
RelevancyEvaluator让AI去评估响应是否与上下文信息一致,给出yes或者no的结果,如果是yes则passing为true,score为1,否则默认passing为false,score为0
示例
scss
@Test
void testEvaluation() {
dataController.delete();
dataController.load();
String userText = "What is the purpose of Carina?";
ChatResponse response = ChatClient.builder(chatModel)
.build().prompt()
.advisors(new QuestionAnswerAdvisor(vectorStore))
.user(userText)
.call()
.chatResponse();
String responseContent = response.getResult().getOutput().getContent();
var relevancyEvaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));
EvaluationRequest evaluationRequest = new EvaluationRequest(userText,
(List<Content>) response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS), responseContent);
EvaluationResponse evaluationResponse = relevancyEvaluator.evaluate(evaluationRequest);
assertTrue(evaluationResponse.isPass(), "Response is not relevant to the question");
}
这里先用userText去问下AI,然后将responseContent、QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS一起丢给relevancyEvaluator,再用AI去评估一下
FactCheckingEvaluator
org/springframework/ai/evaluation/FactCheckingEvaluator.java
typescript
public class FactCheckingEvaluator implements Evaluator {
private static final String DEFAULT_EVALUATION_PROMPT_TEXT = """
Evaluate whether or not the following claim is supported by the provided document.
Respond with "yes" if the claim is supported, or "no" if it is not.
Document: \\n {document}\\n
Claim: \\n {claim}
""";
private static final String BESPOKE_EVALUATION_PROMPT_TEXT = """
Document: \\n {document}\\n
Claim: \\n {claim}
""";
private final ChatClient.Builder chatClientBuilder;
private final String evaluationPrompt;
/**
* Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder. Uses
* the default evaluation prompt suitable for general purpose LLMs.
* @param chatClientBuilder The builder for the ChatClient used to perform the
* evaluation
*/
public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder) {
this(chatClientBuilder, DEFAULT_EVALUATION_PROMPT_TEXT);
}
/**
* Constructs a new FactCheckingEvaluator with the provided ChatClient.Builder and
* evaluation prompt.
* @param chatClientBuilder The builder for the ChatClient used to perform the
* evaluation
* @param evaluationPrompt The prompt text to use for evaluation
*/
public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder, String evaluationPrompt) {
this.chatClientBuilder = chatClientBuilder;
this.evaluationPrompt = evaluationPrompt;
}
/**
* Creates a FactCheckingEvaluator configured for use with the Bespoke Minicheck
* model.
* @param chatClientBuilder The builder for the ChatClient used to perform the
* evaluation
* @return A FactCheckingEvaluator configured for Bespoke Minicheck
*/
public static FactCheckingEvaluator forBespokeMinicheck(ChatClient.Builder chatClientBuilder) {
return new FactCheckingEvaluator(chatClientBuilder, BESPOKE_EVALUATION_PROMPT_TEXT);
}
/**
* Evaluates whether the response content in the EvaluationRequest is factually
* supported by the context provided in the same request.
* @param evaluationRequest The request containing the response to be evaluated and
* the supporting context
* @return An EvaluationResponse indicating whether the claim is supported by the
* document
*/
@Override
public EvaluationResponse evaluate(EvaluationRequest evaluationRequest) {
var response = evaluationRequest.getResponseContent();
var context = doGetSupportingData(evaluationRequest);
String evaluationResponse = this.chatClientBuilder.build()
.prompt()
.user(userSpec -> userSpec.text(this.evaluationPrompt).param("document", context).param("claim", response))
.call()
.content();
boolean passing = evaluationResponse.equalsIgnoreCase("yes");
return new EvaluationResponse(passing, "", Collections.emptyMap());
}
}
FactCheckingEvaluator旨在评估AI生成的响应在给定上下文中的事实准确性。该评估器通过验证给定的声明(claim)是否逻辑上支持提供的上下文(document),帮助检测和减少AI输出中的幻觉现象;在使用FactCheckingEvaluator时,claim和document会被提交给AI模型进行评估。为了更高效地完成这一任务,可以使用更小且更高效的AI模型,例如Bespoke的Minicheck。Minicheck 是一种专门设计用于事实核查的小型高效模型,它通过分析事实信息片段和生成的输出,验证声明是否与文档相符。如果文档能够证实声明的真实性,模型将回答"是",否则回答"否"。这种模型特别适用于检索增强型生成(RAG)应用,确保生成的答案基于上下文信息。
示例
ini
@Test
void testFactChecking() {
// Set up the Ollama API
OllamaApi ollamaApi = new OllamaApi("http://localhost:11434");
ChatModel chatModel = new OllamaChatModel(ollamaApi,
OllamaOptions.builder().model(BESPOKE_MINICHECK).numPredict(2).temperature(0.0d).build())
// Create the FactCheckingEvaluator
var factCheckingEvaluator = new FactCheckingEvaluator(ChatClient.builder(chatModel));
// Example context and claim
String context = "The Earth is the third planet from the Sun and the only astronomical object known to harbor life.";
String claim = "The Earth is the fourth planet from the Sun.";
// Create an EvaluationRequest
EvaluationRequest evaluationRequest = new EvaluationRequest(context, Collections.emptyList(), claim);
// Perform the evaluation
EvaluationResponse evaluationResponse = factCheckingEvaluator.evaluate(evaluationRequest);
assertFalse(evaluationResponse.isPass(), "The claim should not be supported by the context");
}
这里使用ollama调用bespoke-minicheck模型,其temperature设置为0.0,之后把context与claim都传递给factCheckingEvaluator去评估
小结
Spring AI提供了Evaluator接口定义了evaluate方法,用于对ai生成的内容进行评估,避免AI没有产生幻觉式的响应,它有两个实现,分别是RelevancyEvaluator、FactCheckingEvaluator。RelevancyEvaluator用于评估相关性,FactCheckingEvaluator用于评估事实准确性。