基于Elasticsearch8的向量检索实现相似图形搜索

Elasticsearch8版本增加了KNN向量检索,可以基于此功能实现以图搜图功能。

1、首先创建索引,es提供了类型为dense_vector的字段,用于存储向量,其中dims是向量维度,可以不配置,es会根据第一条插入的向量维度自动配置。

java 复制代码
{
  "properties": {
    "file_name": {
      "type": "text"
    },
	"feature": {
        "type": "dense_vector",
        "dims": 5
    },
    "number":{
  		"type": "integer"
	},
	"data_type":{
		"type":"keyword"
	}
  }
}

2、插入10条测试数据

3、通过postman直接进行测试:

field:向量检索字段名

query_vector:输入的向量

k:返回得分最高的前几条数据

num_candidates:在搜索过程中每个分片考虑的候选邻居的数量

关于参数的具体解释,可以看下这篇文章:

如何为 kNN 搜索选择最佳 k 和 num_candidates_numcandidates-CSDN博客

4、java api

导入pom

java 复制代码
        <dependency>
            <groupId>co.elastic.clients</groupId>
            <artifactId>elasticsearch-java</artifactId>
            <version>8.15.2</version>
        </dependency>
        <dependency>
            <artifactId>elasticsearch-rest-client</artifactId>
            <groupId>org.elasticsearch.client</groupId>
            <version>8.15.2</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>2.0.30</version>
        </dependency>

测试类

java 复制代码
import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch._types.SortOrder;
import co.elastic.clients.elasticsearch._types.query_dsl.BoolQuery;
import co.elastic.clients.elasticsearch.core.SearchRequest;
import co.elastic.clients.elasticsearch.core.SearchResponse;
import co.elastic.clients.elasticsearch.core.search.Hit;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.ElasticsearchTransport;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import com.alibaba.fastjson.JSONObject;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class ElasticsearchKnnTest {

    public static void main(String[] args) {
        //获取客户端
        RestClient restClient = RestClient.builder(HttpHost.create("localhost:9200")).build();
        ElasticsearchTransport transport = new RestClientTransport(
                restClient, new JacksonJsonpMapper());
        ElasticsearchClient client = new ElasticsearchClient(transport);
        //查询的向量
        List<Float> queryVector = new ArrayList<>();
        queryVector.add(0.7F);
        queryVector.add(0.66F);
        queryVector.add(1.74F);
        queryVector.add(1.2F);
        queryVector.add(0.9F);
        //取前五个
        Integer top = 5;
        //最小相似度
        Double minScore = 0.9D;
        //组装查询条件,针对feature字段进行相似向量检索,并按照得分排序
        BoolQuery.Builder builder = new BoolQuery.Builder();
        builder.must(q -> q.knn(n -> n.field("feature").queryVector(queryVector).k(top).numCandidates(10)));
        SearchRequest request = new SearchRequest.Builder().index("image")
                .minScore(minScore)
                .query(q -> q.bool(builder.build()))
                .from(0)
                .size(10)
                .sort(s -> s.field(f -> f.field("_score").order(SortOrder.Desc))).build();
        SearchResponse response = null;
        try{
            response = client.search(request, JSONObject.class);
        }catch (IOException e){
            e.getStackTrace();
        }
        //解析并输出检索结果
        List<Hit<JSONObject>> hits = response.hits().hits();
        for(Hit<JSONObject> hit : hits){
            JSONObject data = hit.source();
            System.out.println(data.toJSONString() + "     得分:"+ hit.score());
        }
    }
}

结果

{"number":6,"feature":[0.7,0.66,1.74,1.2,0.9],"file_name":"6.jpg","data_type":"aa"} 得分:0.9999949

{"number":2,"feature":[0.5,0.3,1.7,1.9,1.8],"file_name":"66.jpg","data_type":"aa"} 得分:0.9714658

{"number":23,"feature":[1.7,0.8,1.1,1.5,0.9],"file_name":"23.jpg","data_type":"bb"} 得分:0.9587538

{"number":7,"feature":[0.2,0.23,1.7,1.5,0.2],"file_name":"88.jpg","data_type":"cc"} 得分:0.95746744

{"number":99,"feature":[0.3,1.2,1.7,0.7,1.9],"file_name":"9.jpg","data_type":"gg"} 得分:0.949824

{"number":5,"feature":[0.2,1.3,1.7,1.9,0.2],"file_name":"77.jpg","data_type":"bb"} 得分:0.94946384

{"number":10,"feature":[0.1,0.5,1.7,0.7,2.9],"file_name":"10.jpg","data_type":"bb"} 得分:0.9173416

相关推荐
不会Hello World的小苗20 分钟前
Java——列表(List)
java·python·list
二十七剑1 小时前
jvm中各个参数的理解
java·jvm
东阳马生架构3 小时前
JUC并发—9.并发安全集合四
java·juc并发·并发安全的集合
Elastic 中国社区官方博客3 小时前
Elasticsearch Open Inference API 增加了对 Jina AI 嵌入和 Rerank 模型的支持
大数据·人工智能·elasticsearch·搜索引擎·ai·全文检索·jina
计算机小白一个3 小时前
蓝桥杯 Java B 组之岛屿数量、二叉树路径和(区分DFS与回溯)
java·数据结构·算法·蓝桥杯
隔壁老王1563 小时前
mysql实时同步到es
数据库·mysql·elasticsearch
菠菠萝宝3 小时前
【Java八股文】10-数据结构与算法面试篇
java·开发语言·面试·红黑树·跳表·排序·lru
不会Hello World的小苗3 小时前
Java——链表(LinkedList)
java·开发语言·链表
Allen Bright4 小时前
【Java基础-46.3】Java泛型通配符详解:解锁类型安全的灵活编程
java·开发语言
柃歌4 小时前
【UCB CS 61B SP24】Lecture 7 - Lists 4: Arrays and Lists学习笔记
java·数据结构·笔记·学习·算法