初试 Elasticsearch

了解 ES

是一个非常强大开源搜索引擎。从海量的内容中找到所需要的内容。

elasticsearch 结合 kibana、Logstash、Beats 也就是 elastic stack（ELK）被广泛应用在日志数据分析、实时监控。

发展历程：Lucene 是一个java类库，是Apache的项目，由DougCutting 于 1999 年研发。

优势：易扩展
高性能，基于倒排索引。
缺点：只能是java语言开发，学习难度高，不支持水平扩展。

2004年Shay Banon基于Lucene开发了Compass

2010年Shay Banon重写了Compass，取名为Elasticsearch。

官网地址: www.elastic.co/cn/

支持分布式，可水平扩展
提供Restful接口，可被任何语言调用

倒排索引

一组数据分为文档id和词条

文档：每条数据就是一个文档
词条：文档按照语义分成的词语

步骤：

注意文档数据会被序列化成JSON数据存储。

索引

索引（index）：相同类型的文档集合。
映射（mapping）：索引中文档的字段约束信息，类似表结构约束。

区别

Mysql：擅长事务类型操作，可以确保数据的安全和一致性
Elasticsearch：擅长海量数据的搜索、分析、计算

kibana

导航栏左侧有开发工具，可以帮助我们快速发送 dsl语句给 elasticsearch

分析词语

_analyze：内置的解析词汇

js 复制代码

POST /_analyze
{
  "analyzer": "standard", // 分词器
  "text": "黑马程序员学习java太棒了！" // 内容
}

使用 ik 分词器

ik_smart：最少切分，粗粒度
ik_max_work：最细切分，细粒度

js 复制代码

POST /_analyze
{
  "analyzer": "ik_max_word",
  "text": "黑马程序员学习java太棒了！"
}

扩展字典

为什么需要扩展

我们希望一些网络词汇也算词语
我们需要过滤掉敏感词

打开 IKAnalyzer.cfg.xml 文件

创建文件就可以了！扩展名dic

xml 复制代码

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict">my_ext_dict.dic</entry>
     <!--用户可以在这里配置自己的扩展停止词字典-->
    <entry key="ext_stopwords">my_ext_stopwords.dic</entry>

    <!--用户可以在这里配置远程扩展字典 -->
    <!-- <entry key="remote_ext_dict">words_location</entry> -->
    <!--用户可以在这里配置远程扩展停止词字典-->
    <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

索引库操作

mapping 映射属性

type：字段数据类型，常见的简单类型有：
- 字符串：text（可分词的文本）、keyword（精确值，例如：品牌、国家、ip地址）
- 数值：long、integer、short、byte、double、float、
- 布尔：boolean
- 日期：date
- 对象：object
index：是否创建索引，默认为true（不是true，查询不到该信息）
analyzer：使用哪种分词器（文本才需要分词）
properties：该字段的子字段

索引库的CRUD

创建索引库

js 复制代码

PUT /heima
{
// 映射字段
  "mappings": {
  // 数据
    "properties": {
    // 字段
      "info": {
        "type": "text",
        // 分词器
        "analyzer": "ik_smart"
      },
      "email": {
      // 关键值不需要分词器
        "type": "keyword",
        // 不需要索引，默认true
        "index": false
      },
      "name": {
        "type": "object",
        "properties": {
          "firstName": {
            "type": "keyword"
          },
          "lastName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

返回一下内容，创建成功

js 复制代码

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "heima"
}

查询

js 复制代码

get /索引库名

修改索引库

本身不支持修改的，但是可以往里面加入新的字段

js 复制代码

PUT /索引库名/_mapping
{
  "properties": {
    "新的字段名": {
      "type": "integer"
    }
  }
}

PUT /heima/_mapping
{
  "properties": {
    "age": {
      "type": "integer"
    }
  }
}

成功

js 复制代码

{
  "acknowledged": true
}

删除

js 复制代码

DELETE /heima

文档操作

创建文档

js 复制代码

POST /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    "字段3": {
        "子属性1": "值3",
        "子属性2": "值4"
    },
    // ...
}

POST /heima/_doc/1
{
  "info": "黑马程序员Java讲师",
  "email": "[email protected]",
  "name": {
        "firstName": "云",
        "lastName": "赵"
  }
}

查询删除文档

js 复制代码

GET /索引库名/_doc/文档id 

GET /heima/_doc/1 

DELETE /索引库名/_doc/文档id 

DELETE /heima/_doc/1 

// 查询索引库所有文档内容
GET /hotel/_search

全量修改

会删除旧文档，添加新文档

js 复制代码

PUT /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    // ... 略
}

PUT /heima/_doc/1
{
    "info": "黑马程序员高级Java讲师",
    "email": "[email protected]",
    "name": {
        "firstName": "云",
        "lastName": "赵"
    }
}

增量修改，修改指定字段值

js 复制代码

POST /索引库名/_update/文档id
{
    "doc": {
         "字段名": "新的值",
    }
}

POST /heima/_update/1
{
  "doc": {
    "email": "[email protected]"
  }
}

字段分析

sql 复制代码

create table tb_hotel
(
    id        bigint       not null comment '酒店id'
        primary key,
    name      varchar(255) not null comment '酒店名称',
    address   varchar(255) not null comment '酒店地址',
    price     int          not null comment '酒店价格',
    score     int          not null comment '酒店评分',
    brand     varchar(32)  not null comment '酒店品牌',
    city      varchar(32)  not null comment '所在城市',
    star_name varchar(16)  null comment '酒店星级，1星到5星，1钻到5钻',
    business  varchar(255) null comment '商圈',
    latitude  varchar(32)  not null comment '纬度',
    longitude varchar(32)  not null comment '经度',
    pic       varchar(255) null comment '酒店图片'
)

id：在es中id是字符串，类型是 keyword，需要查询
name：需要分词，类型text，使用 ik_max_word，需要查询
price：需要查询，类型 integer
score：需要查询，类型 float
brand：需要查询，品牌，精准。类型 keyword
city：需要查询，城市精准。keyword
star_name：星级，精准，keyword
business：商圈，精准，keyword
location：geo_point：经纬度，固定类型。
pic：图片路径，不需要查询。keyword

很多时候，我们需要进行全字段查询，所以我们可以创建一个all的字段。将多个关键字通过 copy_to 到all字段上。并使用分词器。

js 复制代码

PUT hotel
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": "all"
      },
      "address": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": "all"
      },
      "price": {
        "type": "integer"
      },
      "score": {
        "type": "float"
      },
      "brand": {
        "type": "keyword",
        "copy_to": "all"
      },
      "city": {
        "type": "keyword",
        "copy_to": "all"
      },
      "star_name": {
        "type": "keyword"
      },
      "business": {
        "type": "keyword"
      },
      "location": {
        "type": "geo_point"
      },
      "pic": {
        "type": "keyword",
        "index": false
      },
      "all": {
        "type": "text",
        "analyzer": "ik_max_word",
      }
    }
  }
}

使用 java 操作

依赖

xml 复制代码

<!-- 需要 jackson-databind 正常 spring提供了 -->
<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>8.17.3</version>
</dependency>
<!-- JSON 解析需要 -->
<dependency>
    <groupId>jakarta.json</groupId>
    <artifactId>jakarta.json-api</artifactId>
    <version>2.0.1</version>
</dependency>
<!-- SpringBootParent 2.7.14 不需要这个 -->
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.12.5</version>
</dependency>
<!-- 8.17.3 内置的版本低 -->
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-client</artifactId>
    <version>8.17.3</version>
</dependency>

初始化

java 复制代码

// 创建低级客户端实例
RestClient restClient = RestClient.builder(HttpHost.create("http://127.0.0.1:19200")).build();
// 传输层解析器，设置JSON传输
RestClientTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
// 封装了低级客户端实例，更高效
ElasticsearchClient esClient = new ElasticsearchClient(transport);
// 不使用记得销毁
esClient.close();

创建索引库

java 复制代码

CreateIndexRequest indexRequest = new CreateIndexRequest.Builder()
    .index("hotel").mappings(m ->
            // id
            m.properties("id", p -> p.keyword(k -> k))
                    // 名称
                    .properties("name", p -> p.text(t -> t.analyzer("ik_max_word").copyTo(List.of("all"))))
                    // 地址
                    .properties("address", p -> p.text(t -> t.analyzer("ik_max_word").copyTo(List.of("all"))))
                    // 价格
                    .properties("price", p -> p.integer(i -> i))
                    // 评分
                    .properties("score", p -> p.float_(f -> f))
                    // 品牌
                    .properties("brand", p -> p.keyword(k -> k.copyTo(List.of("all"))))
                    // 城市
                    .properties("city", p -> p.keyword(k -> k.copyTo(List.of("all"))))
                    // 星级
                    .properties("star_name", p -> p.keyword(k -> k))
                    // 商圈
                    .properties("business", p -> p.keyword(k -> k))
                    // 位置
                    .properties("location", p -> p.geoPoint(g -> g))
                    // 图片
                    .properties("pic", p -> p.keyword(k -> k.index(false)))
                    .properties("all", p -> p.text(t -> t.analyzer("ik_max_word")))
    ).build();
// 创建索引获取结果
CreateIndexResponse createIndexResponse = client.indices().create(indexRequest);
// true 创建成功
System.out.println(createIndexResponse.acknowledged());

删除索引库 / 查询是否存在

是否存在

注意是：co.elastic.clients.elasticsearch.indices.ExistsRequest

java 复制代码

// 创建一个请求对象，检查是否存在名为 "hotel" 的索引
ExistsRequest request = new ExistsRequest.Builder()
    .index("hotel")  // 设置索引名为 "hotel"
    .build();  // 构建请求对象
// 执行索引是否存在的检查，返回结果为 BooleanResponse 对象
BooleanResponse response = client.indices().exists(request);
// 输出检查结果，response.value() 返回的是布尔值，表示索引是否存在
System.out.println(response.value());

删除索引库

java 复制代码

// 创建一个请求对象，准备删除名为 "hotel" 的索引
DeleteIndexRequest request = new DeleteIndexRequest.Builder()
    .index("hotel").build();
// 执行删除索引的操作
DeleteIndexResponse response = client.indices().delete(request);
// 输出删除操作是否成功，response.acknowledged() 返回一个布尔值
System.out.println(response.acknowledged());

新增文档数据

java 复制代码

Hotel hotel = hotelService.getById(36934);
// 注意：this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
HotelDoc hotelDoc = new HotelDoc(hotel);
// 设置索引，和文档数据
IndexRequest<Object> request = IndexRequest.of(i -> i.index("hotel").id(String.valueOf(hotelDoc.getId())).document(hotelDoc));
// 执行新增文档
IndexResponse response = client.index(request);
// 获取结果
System.out.println(response);

查询文档数据

java 复制代码

// 执行查询文档
GetRequest request = GetRequest.of(g -> g.index("hotel").id("36934"));
// 执行查询
GetResponse<HotelDoc> response = client.get(request, HotelDoc.class);
// 判断是否存在
if (response.found()) {
    // 获取文档
    HotelDoc hotelDoc = response.source();
    System.out.println(hotelDoc);
} else {
    System.out.println("没有找到");
}

修改文档

java 复制代码

// 创建修改的结构
Map<String, Object> map = Map.of(
        "price", 999,
        "starName", "四钻"
);
UpdateRequest<Object, Object> request = UpdateRequest.of(
        // 设置索引库、文档ID和修改数据
        u -> u.index("hotel").id("36934").doc(map));

// 执行修改
UpdateResponse<Object> response = client.update(request, Map.class);

// 如果是更新，则更新成功
if (response.result() == Result.Updated) {
    System.out.println("更新成功");
} else {
    System.out.println("更新失败");
}

删除文档

java 复制代码

// 创建删除对象，补全索引和库
DeleteRequest deleteRequest = DeleteRequest.of(builder -> builder.index("hotel").id("36934"));
// 执行删除
DeleteResponse response = client.delete(deleteRequest);
System.out.println(response.id());

批量新增

java 复制代码

// 获取数据
List<Hotel> list = hotelService.list();

// 将数据转换成指定格式，批量插入
List<BulkOperation> list1 = list.stream().map(hotel -> {
    // 创建数据对象，指定id和库名，以及文档数据
    IndexOperation<Object> operation = IndexOperation.of(idx -> idx.index("hotel")
            .id(String.valueOf(hotel.getId())).document(hotel));

    return BulkOperation.of(op -> op.index(operation));
}).toList();

// 注意包名：co.elastic.clients.elasticsearch.core
BulkRequest bulkRequest = BulkRequest.of(b -> b.operations(list1));

BulkResponse response = client.bulk(bulkRequest);

// 处理响应
if (response.errors()) {
    System.err.println("批量插入时发生错误！");
    response.items().forEach(item -> {
        if (item.error() != null) {
            System.err.println("错误：" + item.error().reason());
        }
    });
} else {
    System.out.println("批量插入成功！");
}

数据处理 / 查询

Elasticsearch提供了基于JSON的DSL（Domain Specific Language）来定义查询。

常见的查询类型包括：

查询所有：查询出所有数据，一般测试用。例如：match_all
全文检索（full text）查询：利用分词器对用户输入内容分词，然后去倒排索引库中匹配。
- match_query
- multi_match_query
精确查询：根据精确词条值查找数据，一般是查找keyword、数值、日期、boolean等类型字段。
- ids
- range
- term
地理（geo）查询：根据经纬度查询。例如：
- geo_distance
- geo_bounding_box
复合（compound）查询：复合查询可以将上述各种查询条件组合起来，合并查询条件。
- bool
- function_score

查询

基本语法

js 复制代码

GET /索引库/_search
{
  "query": {
    "查询类型": {
      "查询条件": "条件值"
    }
  }
}

GET /hotel/_search
{
  "query": {
  // 查询所有
    "match_all": {}
  }
}

// 返回值
{
// 时间
  "took": 1,
  // 是否超时
  "timed_out": false,
  "_shards": {},
  // 命中数据
  "hits": {
    "total": {
      "value": 202, // 总数
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [] // 数据
  }
}

java 实现

java 复制代码

// 创建搜索类
SearchRequest.Builder builder = new SearchRequest.Builder();
// 设置索引库和条件，这里设置的是所有
builder.index("hotel").query(q -> q.matchAll(m -> m));
SearchRequest build = builder.build();
// 执行查询
SearchResponse<HotelDoc> search = client.search(build, HotelDoc.class);
System.out.println(search);

全文检索查询

会对用户的内容进行分词。all 词是咱们自己定义的，不在文档内。

js 复制代码

GET /hotel/_search
{
  "query": {
  // 条件
    "match": {
    // 字段 all
      "all": "如家外滩"
    }
  }
}

多字段查询（效率低）

js 复制代码

GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "如家外滩", 
      // 多字段
      "fields": ["brand", "name", "business"]
    }
  }
}

精确 / 范围

精确查询：根据精确词条值查找数据，一般是查找keyword、数值、日期、boolean等类型字段。

精确

js 复制代码

GET /hotel/_search
{
  "query": {
  // 精确
    "term": {
    // 字段
      "city": {
        "value": "上海"
      }
    }
  }
}

范围

js 复制代码

GET /hotel/_search
{
  "query": {
    "range": {
    // 字段
      "price": {
      // gt 大于 gte 大于等于，lte 小于等于
        "gte": 100,
        "lte": 300
      }
    }
  }
}

经纬度查询

根据经纬度查询。常见的使用场景包括：

携程：搜索我附近的酒店
滴滴：搜索我附近的出租车
微信：搜索我附近的人

geo_bounding_box：查询geo_point值落在某个矩形范围的所有文档

js 复制代码

GET /indexName/_search
{
  "query": {
    "geo_bounding_box": {
      "FIELD": {
        "top_left": {
          "lat": 31.1,
          "lon": 121.5
        },
        "bottom_right": {
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

geo_distance：查询到指定中心点小于某个距离值的所有文档

js 复制代码

GET /hotel/_search
{
  "query": {
    "geo_distance": {
      "distance": "15km",
      "location": "31.21, 121.5"
    }
  }
}

复合查询

布尔查询是一个或多个查询子句的组合。子查询的组合方式有：

must：必须匹配每个子查询，类似"与"
should：选择性匹配子查询，类似"或"
must_not：必须不匹配，不参与算分，类似"非"
filter：必须匹配，不参与算分

主词条要算法，其他的不要，影响性能

js 复制代码

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "如家"
          }
        }
      ],
      "must_not": [
        {
          "range": {
            "price": {
              "gt": 400
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

java 实现操作

java 复制代码

SearchRequest.Builder builder = new SearchRequest.Builder();

// match 查询
//        builder.index("hotel").query(q -> q.match(m -> m.field("all").query("如家")));

// 多字段查询
//        builder.index("hotel").query(q -> q.multiMatch(m -> m.query("如家")
//                .fields(List.of("brand", "name"))));

// 精确查询
builder.index("hotel").query(q -> q.term(m -> m.field("city").value("深圳")));
// 范围查询
//        builder.index("hotel").query(q -> q.range(r -> r.number(n ->
//                n.field("price").gte(100.00).lte(150.00))));

// 创建布尔查询
//        builder.index("hotel").query(q -> q.bool(b ->
//                // must 条件
//                b.must(m -> m.term(t -> t.field("city").value("杭州")))
//                        // filter 条件
//                        .filter(f -> f.range(r -> r.number(n -> n.field("price").lte(250.00))))));


SearchRequest build = builder.build();
// 执行查询
SearchResponse<HotelDoc> search = client.search(build, HotelDoc.class);

HitsMetadata<HotelDoc> hits = search.hits();
long total = hits.total().value();
System.out.println(total);
hits.hits().forEach(hit -> {
    HotelDoc source = hit.source();
    System.out.println(source);
});

System.out.println(search);

结果操作

排序，注意排序后ES会放弃打分

js 复制代码

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  // 排序
  "sort": [
    {
    // 字段一
      "score": {
        "order": "desc"
      },
      // 字段二
      "price": {
        "order": "asc"
      }
    }
  ]
}

按照地理位置排序

js 复制代码

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
    // 地理
      "_geo_distance": {
        "location": {
          "lat": 31.034661,
          "lon": 121.612282
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

分页

elasticsearch 默认情况下只返回top10的数据。而如果要查询更多数据就需要修改分页参数了。

elasticsearch中通过修改from、size参数来控制要返回的分页结果：

js 复制代码

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "from": 990, // 分页开始的位置，默认为0
  "size": 10, // 期望获取的文档总数
  "sort": [
    {"price": "asc"}
  ]
}

ES是分布式的，所以会面临深度分页问题。例如按price排序后，获取from = 990，size =10的数据：

首先在每个数据分片上都排序并查询前1000条文档。
然后将所有节点的结果聚合，在内存中重新排序选出前1000条文档
最后从这1000条中，选取从990开始的10条文档

如果搜索页数过深，或者结果集（from + size）越大，对内存和CPU的消耗也越高。因此ES设定结果集查询的上限是10000

针对深度分页，ES提供了两种解决方案，官方文档：

search after：分页时需要排序，原理是从上一次的排序值开始，查询下一页数据。官方推荐使用的方式。
scroll：原理将排序数据形成快照，保存在内存。官方已经不推荐使用。

from + size

优点：支持随机翻页
缺点：深度分页问题，默认查询上限（from + size）是10000
场景：百度、京东、谷歌、淘宝这样的随机翻页搜索

after search

优点：没有查询上限（单次查询的size不超过10000）
缺点：只能向后逐页查询，不支持随机翻页
场景：没有随机翻页需求的搜索，例如手机向下滚动翻页

scroll

优点：没有查询上限（单次查询的size不超过10000）
缺点：会有额外内存消耗，并且搜索结果是非实时的
场景：海量数据的获取和迁移。从ES7.1开始不推荐，建议用 after search方案。

java 实现

java 复制代码

SearchRequest.Builder builder = new SearchRequest.Builder();
// match 查询
builder.index("hotel").query(q -> q.matchAll(m -> m))
        // 分页
        .from(0).size(5)
        // 排序
        .sort(s -> s.field(f -> f.field("price").order(SortOrder.Asc)));
        // 可以继续加
//                .sort();

SearchRequest build = builder.build();
SearchResponse<HotelDoc> search = client.search(build, HotelDoc.class);

HitsMetadata<HotelDoc> hits = search.hits();
long total = hits.total().value();
System.out.println(total);
hits.hits().forEach(hit -> {
    HotelDoc source = hit.source();
    System.out.println(source);
});

System.out.println(search);

高亮

高亮：就是在搜索结果中把搜索关键字突出显示。

原理是这样的：

将搜索结果中的关键字用标签标记出来在页面中给标签添加css样式，默认是em

js 复制代码

GET /hotel/_search
{
  "query": {
    "match": {
      "all": "如家"
    }
  },
  "highlight": {
    "fields": {
      "name": {
      // 前缀
        "pre_tags": "<em>",
        // 后缀
        "post_tags": "</em>",
        // 是否需要字段匹配，默认true
        "require_field_match": "false"
      }
    }
  }
}

java 实现

java 复制代码

SearchRequest.Builder builder = new SearchRequest.Builder();
// match 查询
builder.index("hotel").query(q -> q.match(m -> m.field("all").query("如家")))
        // 高亮
        .highlight(h ->
                // 高亮字段，可以多个
                h.fields("name", f -> f)
                        // 是否与查询字段匹配
                        .requireFieldMatch(false)
                        // 设置高亮标签
                        .preTags("<em>").postTags("</em>"));

SearchRequest build = builder.build();
SearchResponse<HotelDoc> search = client.search(build, HotelDoc.class);

HitsMetadata<HotelDoc> hits = search.hits();
long total = hits.total().value();
System.out.println(total);
hits.hits().forEach(hit -> {

    // 获取高亮
//            Map<String, List<String>> highlight = hit.highlight();

    HotelDoc source = hit.source();
    System.out.println(source);
});

System.out.println(search);

实战

搜素查询

java 复制代码

@Override
public PageResult search(RequestParam param) {

    SearchRequest.Builder builder = new SearchRequest.Builder();

    // 查询
    builder.index("hotel").query(q -> StringUtils.isNotBlank(param.getKey()) ?
            q.match(m -> m.field("all").query(param.getKey())) : q.matchAll(m -> m));

    // 分页
    builder.from((param.getPage() - 1) * param.getSize()).size(param.getSize());

    try {
        // 发起请求
        SearchResponse<HotelDoc> response = elasticsearchClient.search(builder.build(), HotelDoc.class);
        HitsMetadata<HotelDoc> hits = response.hits();

        // 整合数据
        Assert.isTrue(hits.total() != null, "总数为空");
        long total = hits.total().value();
        List<HotelDoc> hotelDocs = hits.hits().stream().map(Hit::source).toList();
        return new PageResult(total, hotelDocs);
    } catch (IOException e) {
        throw new RuntimeException("请求ES数据失败", e);
    }
}

位置排序

java 复制代码

@Override
public PageResult search(RequestParam param) {

    SearchRequest.Builder builder = new SearchRequest.Builder();

    // 这里使用 query 来更好的阅读代码
    BoolQuery.Builder boolQuery = new BoolQuery.Builder();

    // 过滤关键字：需要算分
    boolQuery.must(mu -> StringUtils.isNotBlank(param.getKey()) ?
            mu.match(m -> m.field("all").query(param.getKey())) : mu.matchAll(a -> a));

    // 条件查询
    if (StringUtils.isNotBlank(param.getCity())) {
        boolQuery.filter(f -> f.term(t -> t.field("city").value(param.getCity())));
    }
    if (StringUtils.isNotBlank(param.getBrand())) {
        boolQuery.filter(f -> f.term(t -> t.field("brand").value(param.getBrand())));
    }
    if (param.getMinPrice() != null && param.getMaxPrice() != null) {
        boolQuery.filter(f -> f.range(t -> t.number(n ->
                n.field("price").gte(param.getMinPrice()).lte(param.getMaxPrice()))));
    }

    // 查询
    builder.index("hotel").query(q -> q.bool(boolQuery.build()));

    // 分页
    builder.from((param.getPage() - 1) * param.getSize()).size(param.getSize());

    // 排序
    if (StringUtils.isNotBlank(param.getLocation())) {
        builder.sort(s -> s.geoDistance(geo -> geo.field("location")
                        // 使用集合的方式---ES 推荐
//                .location(l -> l.coords(List.of()))))
                        .location(l -> l.text(param.getLocation()))
                        .order(SortOrder.Asc) // 排序
                        .unit(DistanceUnit.Kilometers) // 单位km
        ));
    }

    try {
        // 发起请求
        SearchResponse<HotelDoc> response = elasticsearchClient.search(builder.build(), HotelDoc.class);
        HitsMetadata<HotelDoc> hits = response.hits();

        // 整合数据
        Assert.isTrue(hits.total() != null, "总数为空");
        long total = hits.total().value();
        List<HotelDoc> hotelDocs = hits.hits().stream().map(hotelDocHit -> {
            if (hotelDocHit.sort().isEmpty()) {
                return hotelDocHit.source();
            }
            double value = hotelDocHit.sort().get(0).doubleValue();
            HotelDoc source = hotelDocHit.source();
            assert source != null;
            source.setDistance(value);
            return source;
        }).toList();
        return new PageResult(total, hotelDocs);
    } catch (IOException e) {
        throw new RuntimeException("请求ES数据失败", e);
    }
}

算分

java 复制代码

@Override
public PageResult search(RequestParam param) {

    SearchRequest.Builder builder = new SearchRequest.Builder();

    // 这里使用 query 来更好的阅读代码
    BoolQuery.Builder boolQuery = new BoolQuery.Builder();

    // 过滤关键字：需要算分
    boolQuery.must(mu -> StringUtils.isNotBlank(param.getKey()) ?
            mu.match(m -> m.field("all").query(param.getKey())) : mu.matchAll(a -> a));

    // 条件查询
    if (StringUtils.isNotBlank(param.getCity())) {
        boolQuery.filter(f -> f.term(t -> t.field("city").value(param.getCity())));
    }
    if (StringUtils.isNotBlank(param.getBrand())) {
        boolQuery.filter(f -> f.term(t -> t.field("brand").value(param.getBrand())));
    }
    if (param.getMinPrice() != null && param.getMaxPrice() != null) {
        boolQuery.filter(f -> f.range(t -> t.number(n ->
                n.field("price").gte(param.getMinPrice()).lte(param.getMaxPrice()))));
    }

    // 查询
    builder.index("hotel").query(q -> q.functionScore(f ->
            // 使用 bool 查询
            f.query(fq -> fq.bool(boolQuery.build()))
                    // 过滤出 isAD 字段等于 true 的进行加分
                    .functions(fun -> fun.filter(fq -> fq.term(t -> t.field("isAD").value(true)))
                            // 分值
                            .weight(10.0))

    ));


    // 分页
    builder.from((param.getPage() - 1) * param.getSize()).size(param.getSize());

    // 排序
    if (StringUtils.isNotBlank(param.getLocation())) {
        builder.sort(s -> s.geoDistance(geo -> geo.field("location")
                        // 使用集合的方式---ES 推荐
//                .location(l -> l.coords(List.of()))))
                        .location(l -> l.text(param.getLocation()))
                        .order(SortOrder.Asc) // 排序
                        .unit(DistanceUnit.Kilometers) // 单位km
        ));
    }

    try {
        // 发起请求
        SearchResponse<HotelDoc> response = elasticsearchClient.search(builder.build(), HotelDoc.class);
        HitsMetadata<HotelDoc> hits = response.hits();

        // 整合数据
        Assert.isTrue(hits.total() != null, "总数为空");
        long total = hits.total().value();
        List<HotelDoc> hotelDocs = hits.hits().stream().map(hotelDocHit -> {
            if (hotelDocHit.sort().isEmpty()) {
                return hotelDocHit.source();
            }
            double value = hotelDocHit.sort().get(0).doubleValue();
            HotelDoc source = hotelDocHit.source();
            assert source != null;
            source.setDistance(value);
            return source;
        }).toList();
        return new PageResult(total, hotelDocs);
    } catch (IOException e) {
        throw new RuntimeException("请求ES数据失败", e);
    }
}

聚合

聚合（aggregations）可以实现对文档数据的统计、分析、运算。聚合常见的有三类：

桶（Bucket）聚合：用来对文档做分组

TermAggregation：按照文档字段值分组
Date Histogram：按照日期阶梯分组，例如一周为一组，或者一月为一组

度量（Metric）聚合：用以计算一些值，比如：最大值、最小值、平均值等

Avg：求平均值
Max：求最大值
Min：求最小值
Stats：同时求max、min、avg、sum等

管道（pipeline）聚合：其它聚合的结果为基础做聚合

参与聚合的字段类型必须是：

keyword
数值
日期
布尔

初试 Elasticsearch

了解 ES

倒排索引

索引

kibana

分析词语

扩展字典

索引库操作

mapping 映射属性

索引库的CRUD

文档操作

字段分析

使用 java 操作

初始化

创建索引库

删除索引库 / 查询是否存在

新增文档数据

查询文档数据

修改文档

删除文档

批量新增

数据处理 / 查询

查询

全文检索查询

精确 / 范围

经纬度查询

相关性算法

复合查询

结果操作

分页

高亮

实战

聚合