开源啦!欢迎大家Star&Fork,基于Java的开源评测引擎eval-engine:gitee.com/skeletron20...
评测要求
需要对某问答助手的回复和GroundTruth的相似度进行评估
评测实现
1、引入依赖最新eval-engine依赖
java
<dependency>
<groupId>io.gitee.skeletron2011</groupId>
<artifactId>eval-engine</artifactId>
<version>0.0.3</version>
</dependency>
2、编写评测代码
整个评测工作流如图所示:

相关代码实现:
java
package example1.testsuite;
import org.apache.commons.lang3.tuple.ImmutablePair;
import org.apache.commons.lang3.tuple.Pair;
import org.evaltool.evalengine.common.utils.DateUtils;
import org.evaltool.evalengine.eval.model.ApiCompletionResult;
import org.evaltool.evalengine.eval.model.DataItem;
import org.evaltool.evalengine.eval.model.InputData;
import org.evaltool.evalengine.eval.node.api.ApiCompletion;
import org.evaltool.evalengine.eval.node.dataloader.DataLoader;
import org.evaltool.evalengine.eval.node.reporter.HtmlReporter;
import org.evaltool.evalengine.eval.node.reporter.JsonReporter;
import org.evaltool.evalengine.eval.node.reporter.Reporter;
import org.evaltool.evalengine.eval.node.scorer.Scorer;
import org.evaltool.evalengine.eval.node.scorer.VectorSimilarityScorer;
import org.evaltool.evalengine.workflow.WorkflowBuilder;
import org.evaltool.evalengine.workflow.WorkflowNode;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.Test;
import java.util.List;
import java.util.Map;
/**
* mvn clean test -DsuiteXmlFile=src/test/example1.xml -Dtest=example1.testsuite.EvalTest#test
*/
public class EvalTest {
WorkflowBuilder builder;
DataLoader dataLoader;
ApiCompletion apiCompletion;
Scorer scorer;
Reporter stdOutReporter;
Reporter htmlReporter;
Reporter jsonReporter;
@BeforeMethod
public void beforeMethod() {
dataLoader = new DataLoader() {
@Override
public List<InputData> prepareDataList() {
return List.of(new InputData(Map.of("query", "Hello,world!", "groundTruth", "Hi, My friends!")));
}
};
apiCompletion = new ApiCompletion() {
@Override
protected ApiCompletionResult invoke(DataItem dataItem) {
return new ApiCompletionResult(Map.of("response", "Hi!"));
}
};
scorer = new VectorSimilarityScorer("相似度品评估", 0.9) {
@Override
public Pair<String, String> prepareFieldPair(DataItem dataItem) {
String groundTruth = dataItem.getInputData().get("groundTruth");
String response = dataItem.getApiCompletionResult().get("response");
return new ImmutablePair<>(groundTruth, response);
}
};
stdOutReporter = new Reporter() {
@Override
protected void report(List<DataItem> items) {
items.forEach(System.out::println);
}
};
String fileName = "EvalTest" + DateUtils.getNowDateStr();
htmlReporter = new HtmlReporter(fileName);
jsonReporter = new JsonReporter(fileName);
}
@Test
public void test() {
WorkflowNode[] reporters = {stdOutReporter, htmlReporter, jsonReporter};
builder = new WorkflowBuilder();
builder.addNodes(dataLoader, apiCompletion, scorer);
builder.addNodes(reporters);
builder.addDependency(dataLoader, apiCompletion);
builder.addDependency(apiCompletion, scorer);
builder.addDependencies(scorer, reporters);
builder.build().execute();
}
}
评测运行
执行如下mvn命令触发执行:
bash
mvn clean test -DsuiteXmlFile=src/test/example1.xml -Dtest=example1.testsuite.EvalTest#test
评测结果
由于添加了json和html结果上报器,所以会生成json评测结果和html评测报告。
json评测结果如下:
json
{
"dataItems": [
{
"dataIndex": 0,
"inputData": {
"dataIndex": 0,
"inputItem": {
"query": "Hello,world!",
"groundTruth": "Hi, My friends!"
}
},
"apiCompletionResult": {
"dataIndex": 0,
"resultItem": {
"response": "Hi!"
},
"timeCost": 0
},
"scorerResults": [
{
"dataIndex": 0,
"metric": "相似度品评估",
"score": 0.0,
"reason": "相似度为0.5774,小于阈值0.9000",
"extra": {
"similarity": 0.5773502691896258,
"threshold": 0.9
},
"timeCost": 97
}
],
"extra": null
}
],
"countResult": null
}
html评测报告如下:
