基于Elasticsearch8的向量检索实现相似图形搜索

Elasticsearch8版本增加了KNN向量检索,可以基于此功能实现以图搜图功能。

1、首先创建索引,es提供了类型为dense_vector的字段,用于存储向量,其中dims是向量维度,可以不配置,es会根据第一条插入的向量维度自动配置。

java 复制代码
{
  "properties": {
    "file_name": {
      "type": "text"
    },
	"feature": {
        "type": "dense_vector",
        "dims": 5
    },
    "number":{
  		"type": "integer"
	},
	"data_type":{
		"type":"keyword"
	}
  }
}

2、插入10条测试数据

3、通过postman直接进行测试:

field:向量检索字段名

query_vector:输入的向量

k:返回得分最高的前几条数据

num_candidates:在搜索过程中每个分片考虑的候选邻居的数量

关于参数的具体解释,可以看下这篇文章:

如何为 kNN 搜索选择最佳 k 和 num_candidates_numcandidates-CSDN博客

4、java api

导入pom

java 复制代码
        <dependency>
            <groupId>co.elastic.clients</groupId>
            <artifactId>elasticsearch-java</artifactId>
            <version>8.15.2</version>
        </dependency>
        <dependency>
            <artifactId>elasticsearch-rest-client</artifactId>
            <groupId>org.elasticsearch.client</groupId>
            <version>8.15.2</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>2.0.30</version>
        </dependency>

测试类

java 复制代码
import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch._types.SortOrder;
import co.elastic.clients.elasticsearch._types.query_dsl.BoolQuery;
import co.elastic.clients.elasticsearch.core.SearchRequest;
import co.elastic.clients.elasticsearch.core.SearchResponse;
import co.elastic.clients.elasticsearch.core.search.Hit;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.ElasticsearchTransport;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import com.alibaba.fastjson.JSONObject;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class ElasticsearchKnnTest {

    public static void main(String[] args) {
        //获取客户端
        RestClient restClient = RestClient.builder(HttpHost.create("localhost:9200")).build();
        ElasticsearchTransport transport = new RestClientTransport(
                restClient, new JacksonJsonpMapper());
        ElasticsearchClient client = new ElasticsearchClient(transport);
        //查询的向量
        List<Float> queryVector = new ArrayList<>();
        queryVector.add(0.7F);
        queryVector.add(0.66F);
        queryVector.add(1.74F);
        queryVector.add(1.2F);
        queryVector.add(0.9F);
        //取前五个
        Integer top = 5;
        //最小相似度
        Double minScore = 0.9D;
        //组装查询条件,针对feature字段进行相似向量检索,并按照得分排序
        BoolQuery.Builder builder = new BoolQuery.Builder();
        builder.must(q -> q.knn(n -> n.field("feature").queryVector(queryVector).k(top).numCandidates(10)));
        SearchRequest request = new SearchRequest.Builder().index("image")
                .minScore(minScore)
                .query(q -> q.bool(builder.build()))
                .from(0)
                .size(10)
                .sort(s -> s.field(f -> f.field("_score").order(SortOrder.Desc))).build();
        SearchResponse response = null;
        try{
            response = client.search(request, JSONObject.class);
        }catch (IOException e){
            e.getStackTrace();
        }
        //解析并输出检索结果
        List<Hit<JSONObject>> hits = response.hits().hits();
        for(Hit<JSONObject> hit : hits){
            JSONObject data = hit.source();
            System.out.println(data.toJSONString() + "     得分:"+ hit.score());
        }
    }
}

结果

{"number":6,"feature":[0.7,0.66,1.74,1.2,0.9],"file_name":"6.jpg","data_type":"aa"} 得分:0.9999949

{"number":2,"feature":[0.5,0.3,1.7,1.9,1.8],"file_name":"66.jpg","data_type":"aa"} 得分:0.9714658

{"number":23,"feature":[1.7,0.8,1.1,1.5,0.9],"file_name":"23.jpg","data_type":"bb"} 得分:0.9587538

{"number":7,"feature":[0.2,0.23,1.7,1.5,0.2],"file_name":"88.jpg","data_type":"cc"} 得分:0.95746744

{"number":99,"feature":[0.3,1.2,1.7,0.7,1.9],"file_name":"9.jpg","data_type":"gg"} 得分:0.949824

{"number":5,"feature":[0.2,1.3,1.7,1.9,0.2],"file_name":"77.jpg","data_type":"bb"} 得分:0.94946384

{"number":10,"feature":[0.1,0.5,1.7,0.7,2.9],"file_name":"10.jpg","data_type":"bb"} 得分:0.9173416

相关推荐
RainbowSea14 分钟前
跨域问题(Allow CORS)解决(3 种方法)
java·spring boot·后端
掘金-我是哪吒15 分钟前
分布式微服务系统架构第155集:JavaPlus技术文档平台日更-Java线程池实现原理
java·分布式·微服务·云原生·架构
RainbowSea18 分钟前
问题 1:MyBatis-plus-3.5.9 的分页功能修复
java·spring boot·mybatis
前端 贾公子22 分钟前
monorepo + Turborepo --- 开发应用程序
java·前端·javascript
不学会Ⅳ1 小时前
Mac M芯片搭建jdk源码环境(jdk24)
java·开发语言·macos
虫小宝1 小时前
高佣金返利平台监控体系建设:APM、链路追踪与佣金异常预警系统技术实现
java
sniper_fandc2 小时前
SpringBoot系列—入门
java·spring boot·后端
代码的余温3 小时前
Maven引入第三方JAR包实战指南
java·maven·jar
pianmian16 小时前
类(JavaBean类)和对象
java
我叫小白菜7 小时前
【Java_EE】单例模式、阻塞队列、线程池、定时器
java·开发语言