🚀AI评测这么玩(2)——使用开源评测引擎eval-engine实现问答相似度评估

开源啦!欢迎大家Star&Fork,基于Java的开源评测引擎eval-engine:gitee.com/skeletron20...

评测要求

需要对某问答助手的回复和GroundTruth的相似度进行评估

评测实现

1、引入依赖最新eval-engine依赖

java 复制代码
<dependency>
    <groupId>io.gitee.skeletron2011</groupId>
    <artifactId>eval-engine</artifactId>
    <version>0.0.3</version>
</dependency>

2、编写评测代码

整个评测工作流如图所示:

相关代码实现:

java 复制代码
package example1.testsuite;

import org.apache.commons.lang3.tuple.ImmutablePair;
import org.apache.commons.lang3.tuple.Pair;
import org.evaltool.evalengine.common.utils.DateUtils;
import org.evaltool.evalengine.eval.model.ApiCompletionResult;
import org.evaltool.evalengine.eval.model.DataItem;
import org.evaltool.evalengine.eval.model.InputData;
import org.evaltool.evalengine.eval.node.api.ApiCompletion;
import org.evaltool.evalengine.eval.node.dataloader.DataLoader;
import org.evaltool.evalengine.eval.node.reporter.HtmlReporter;
import org.evaltool.evalengine.eval.node.reporter.JsonReporter;
import org.evaltool.evalengine.eval.node.reporter.Reporter;
import org.evaltool.evalengine.eval.node.scorer.Scorer;
import org.evaltool.evalengine.eval.node.scorer.VectorSimilarityScorer;
import org.evaltool.evalengine.workflow.WorkflowBuilder;
import org.evaltool.evalengine.workflow.WorkflowNode;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.Test;

import java.util.List;
import java.util.Map;

/**
 * mvn clean test -DsuiteXmlFile=src/test/example1.xml -Dtest=example1.testsuite.EvalTest#test
 */
public class EvalTest {
    WorkflowBuilder builder;
    DataLoader dataLoader;
    ApiCompletion apiCompletion;
    Scorer scorer;
    Reporter stdOutReporter;
    Reporter htmlReporter;
    Reporter jsonReporter;

    @BeforeMethod
    public void beforeMethod() {
        dataLoader = new DataLoader() {
            @Override
            public List<InputData> prepareDataList() {
                return List.of(new InputData(Map.of("query", "Hello,world!", "groundTruth", "Hi, My friends!")));
            }
        };

        apiCompletion = new ApiCompletion() {
            @Override
            protected ApiCompletionResult invoke(DataItem dataItem) {
                return new ApiCompletionResult(Map.of("response", "Hi!"));
            }
        };

        scorer = new VectorSimilarityScorer("相似度品评估", 0.9) {
            @Override
            public Pair<String, String> prepareFieldPair(DataItem dataItem) {
                String groundTruth = dataItem.getInputData().get("groundTruth");
                String response = dataItem.getApiCompletionResult().get("response");
                return new ImmutablePair<>(groundTruth, response);
            }
        };

        stdOutReporter = new Reporter() {
            @Override
            protected void report(List<DataItem> items) {
                items.forEach(System.out::println);
            }
        };
        String fileName = "EvalTest" + DateUtils.getNowDateStr();
        htmlReporter = new HtmlReporter(fileName);
        jsonReporter = new JsonReporter(fileName);
    }

    @Test
    public void test() {
        WorkflowNode[] reporters = {stdOutReporter, htmlReporter, jsonReporter};
        builder = new WorkflowBuilder();
        builder.addNodes(dataLoader, apiCompletion, scorer);
        builder.addNodes(reporters);
        builder.addDependency(dataLoader, apiCompletion);
        builder.addDependency(apiCompletion, scorer);
        builder.addDependencies(scorer, reporters);

        builder.build().execute();
    }
}

评测运行

执行如下mvn命令触发执行:

bash 复制代码
mvn clean test -DsuiteXmlFile=src/test/example1.xml -Dtest=example1.testsuite.EvalTest#test

评测结果

由于添加了json和html结果上报器,所以会生成json评测结果和html评测报告。

json评测结果如下:

json 复制代码
{
  "dataItems": [
    {
      "dataIndex": 0,
      "inputData": {
        "dataIndex": 0,
        "inputItem": {
          "query": "Hello,world!",
          "groundTruth": "Hi, My friends!"
        }
      },
      "apiCompletionResult": {
        "dataIndex": 0,
        "resultItem": {
          "response": "Hi!"
        },
        "timeCost": 0
      },
      "scorerResults": [
        {
          "dataIndex": 0,
          "metric": "相似度品评估",
          "score": 0.0,
          "reason": "相似度为0.5774,小于阈值0.9000",
          "extra": {
            "similarity": 0.5773502691896258,
            "threshold": 0.9
          },
          "timeCost": 97
        }
      ],
      "extra": null
    }
  ],
  "countResult": null
}

html评测报告如下:

相关推荐
Hello--_--World2 分钟前
VUE:逻辑复用
前端·javascript·vue.js
好大哥呀7 分钟前
如何在Spring Boot中配置数据库连接?
数据库·spring boot·后端
老神在在00115 分钟前
企业级 SpringBoot 后端通用开发规范|统一响应 + 敏感字段加密
spring boot·后端·状态模式
陶甜也19 分钟前
3D智慧城市:blender建模、骨骼、动画、VUE、threeJs引入渲染,飞行视角,涟漪、人物行走
前端·3d·vue·blender·threejs·模型
csdn_aspnet23 分钟前
在 ASP.NET Core (WebAPI) 中启用 CORS
后端·asp.net·.netcore
好家伙VCC24 分钟前
**InfluxDB实战进阶:基于Golang的高性能时序数据采集与可视化方
java·开发语言·后端·python·golang
患得患失94924 分钟前
【前端websocket】企业级功能清单
前端·websocket·网络协议
落魄江湖行24 分钟前
基础篇四 Nuxt4 全局样式与 CSS 模块
前端·css·typescript·nuxt4
禅思院25 分钟前
前端性能优化:从"术"到"道"的完整修炼指南
前端·架构·前端框架
心静财富之门1 小时前
Flask 详细讲解 + 实战实例(零基础可学)
后端·python·flask