Elasticsearch 入门

1. 认识 Elasticsearch

1.1 现有查询数据存在的问题

查询效率较低

由于数据库模糊查询不走索引，在数据量较大的时候，查询性能很差。
功能单一

数据库的模糊搜索功能单一，匹配条件非常苛刻，必须恰好包含用户搜索的关键字。

1.2 倒排索引

Elasticsearch 之所以有如此高性能的搜索表现，正是得益于底层的倒排索引技术。那么什么是倒排索引呢？

倒排索引的概念是基于正向索引而言的。

正向索引：适合于根据索引字段的精确搜索，不适合基于部分词条的模糊匹配。
倒排索引：就是为了解决的就是根据部分词条模糊匹配的问题。

倒排索引中有两个非常重要的概念：

文档（Document）：用来搜索的数据，其中的每一条数据就是一个文档。
词条（Term）：对文档数据或用户搜索数据，利用某种算法分词，得到的具备含义的词语就是词条。

创建倒排索引 是对正向索引的一种特殊处理和应用，将每一个文档的数据利用分词算法根据语义拆分，得到一个个词条，倒排索引记录每个词条对应的文档 id。

到排索引搜索流程：

2. 快速入门

2.1 创建索引

打开 Kibana，进入 DevTools

执行下边的命令向 ES 添加文档，如果 my_index 索引不存在会自动创建：

js 复制代码

POST /my_index/_doc/1
{
  "title": "Elasticsearch: cool and easy",
  "content": "This is a test document"
}

2.2 查询文档

根据 id 查询文档：

js 复制代码

GET /my_index/_doc/1

参数说明：

my_index：索引名
_doc：固定
1：文档的 id

2.3 搜索文档

js 复制代码

GET /my_index/_search
{
  "query": {
    "match": {
      "content": "test"
    }
  }
}

参数说明：

content：my_index 索引中的字段名。
"test"：搜索的关键字。

2.4 删除文档

js 复制代码

DELETE /my_index/_doc/1

3. 索引操作

Mapping 映射就类似表的结构。我们要向 es 中存储数据，必须先创建 Index 和 Mapping

3.1 Mapping 映射属性

常见的 Mapping 属性包括：

type：字段数据类型，常见的简单类型有：
- 字符串：text（可分词的文本）
- keyword（精确值，例如：品牌、国家、ip 地址），keyword 类型主要用于存储不需要分词处理的字符串，这些字符串通常用于精确匹配搜索。
- 数值：long、integer、short、byte、double、float、
- 布尔：boolean
- 日期：date
- 对象：object
index：是否索引
- true：可对此字段搜索，并且如果 type 为 text 则会对文本内容进行分词
- false：表示不分词也不能搜索。
analyzer：添加索引时使用哪种分词器分词
properties：该字段的子字段
search_analyzer：搜索时使用哪种分词器分词

通常情况下，我们在搜索和创建索引时使用的是同一分析器，默认情况下搜索将会使用字段映射时定义的分析器，也能通过 search_analyzer 设置不同的分词器。

json 复制代码

"analyzer": "分词器",
"search_analyzer":"搜索时用的分词器"

3.2 创建索引

基本语法：

请求方式：PUT
请求路径：/索引名，可以自定义
请求参数：mapping 映射

json 复制代码

PUT /索引名称
{
  "mappings": {
    "properties": {
      "字段名":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "字段名2":{
        "type": "keyword",
        "index": "false"
      },
      "字段名3":{
        "properties": {
          "子字段": {
            "type": "keyword"
          }
        }
      },
      // ...略
    }
  }
}

举例：

文档

json 复制代码

{
    "age": 21,
    "weight": 52.1,
    "isMarried": false,
    "info": "Java讲师",
    "email": "hz@itcast.cn",
    "score": [99.1, 99.5, 98.9],
    "name": {
        "firstName": "赵",
        "lastName": "云"
    }
}

对应的 Elasticsearch 映射语句：

json 复制代码

PUT /索引名
{
  "mappings": {
    "properties": {
      "age": { "type": "integer" },
      "weight": { "type": "float" },
      "isMarried": { "type": "boolean" },
      "info": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer":"ik_smart"
      },
      "email": {
        "type": "keyword",
        "index": false // 不对email字段进行索引，既不分词也不搜索
      },
      "score": { "type": "double" },
      "name": {
        "properties": {
          "firstName": { "type": "keyword" },
          "lastName": { "type": "keyword" }
        }
      }
    }
  }
}

3.3 查询索引

基本语法：

请求方式：GET
请求路径：/索引名
请求参数：无

json 复制代码

GET /索引名

3.4 修改索引

json 复制代码

PUT /索引名/_mapping
{
  "properties": {
    "新字段名":{
      "type": "integer"
    }
  }
}

但是坚决建议不建议对索引进行修改，如果一定有改变，建议先删除，再创建！！！

3.5 删除索引

语法：

请求方式：DELETE
请求路径：/索引名
请求参数：无

json 复制代码

DELETE /索引名

4. 文档操作

4.1 新增文档

json 复制代码

POST /索引名/_doc/文档id
{
  "字段1": "值1",
  "字段2": "值2",
  "字段3": {
    "子属性1": "值3",
    "子属性2": "值4"
  },
}

4.2 查询文档

json 复制代码

GET /{索引名称}/_doc/{id}

4.3 删除文档

json 复制代码

DELETE /{索引名}/_doc/id值

4.4 修改文档

修改有两种方式：

全量修改：直接覆盖原来的文档
局部修改：修改文档中的部分字段

4.4.1 全量修改

全量修改是覆盖原来的文档，其本质是两步操作：

根据指定的 id 删除文档
新增一个相同 id 的文档

注意：如果根据 id 删除时，id 不存在，第二步的新增也会执行，也就从修改变成了新增操作了。

json 复制代码

PUT /{索引名}/_doc/文档id
{
  "字段1": "值1",
  "字段2": "值2",
  // ... 略
}

由于 id 为 1 的文档已经被删除，所以第一次执行时，得到的反馈是 created：

所以如果执行第 2 次时，得到的反馈则是 updated：

4.4.2 局部修改

局部修改是只修改指定 id 匹配的文档中的部分字段。

json 复制代码

POST /{索引名}/_update/文档id
{
  "doc": {
    "字段名": "新的值",
  }
}

5. Java Client

5.1 配置 Java client

官网

Java Client 要求：

Java 8 或更高版本。
JSON 对象映射库，可将您的应用程序类与 Elasticsearch API 无缝集成。

在父工程进行版本锁定：

xml 复制代码

<properties>
  <es.version>7.17.7</es.version>
  <jackson.version>2.13.0</jackson.version>
  <jakarta.json-ai.version>2.0.1</jakarta.json-ai.version>
</properties>

<!-- 对依赖包进行管理 -->
<dependencyManagement>
  <dependencies>
    <!--es-->
    <dependency>
      <groupId>co.elastic.clients</groupId>
      <artifactId>elasticsearch-java</artifactId>
      <version>${es.version}</version>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
      <version>${jackson.version}</version>
    </dependency>
    <dependency>
      <groupId>jakarta.json</groupId>
      <artifactId>jakarta.json-api</artifactId>
      <version>${jakarta.json-ai.version}</version>
    </dependency>
  </dependencies>
</dependencyManagement>

子模块中移入依赖

xml 复制代码

<dependency>
  <groupId>co.elastic.clients</groupId>
  <artifactId>elasticsearch-java</artifactId>
</dependency>
<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
  <groupId>jakarta.json</groupId>
  <artifactId>jakarta.json-api</artifactId>
</dependency>

5.2 创建索引

以商城项目为例，使用 Java Client 维护索引数据。

表结构如下：

结合数据库表结构，以上字段对应的 mapping 映射属性如下：

字段名	字段类型	类型说明	是否参与搜索	是否参与分词
id	`long`	长整数	√
name	`text`	字符串，参与分词搜索	√	√
price	`integer`	以分为单位，所以是整数	√
stock	`integer`	字符串，但是不分词	√
image	`keyword`	字符串，但是不分词
category	`keyword`	字符串，但是不分词	√
brand	`keyword`	字符串，但是不分词	√
sold	`integer`	销量，整数	√
commentCount	`integer`	评价，整数
isAD	`boolean`	布尔类型	√
updateTime	`Date`	更新时间	√

创建 item 的索引

json 复制代码

PUT /items
{
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "name":{
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "price":{ "type": "integer" },
      "stock":{ "type": "integer" },
      "image":{ "type": "keyword", "index": false },
      "category":{ "type": "keyword" },
      "brand":{ "type": "keyword" },
      "sold":{ "type": "integer" },
      "commentCount":{ "type": "integer", "index": false },
      "isAD":{ "type": "boolean" },
      "updateTime":{ "type": "date" }
    }
  }
}

依据索引映射创建模型类：

java 复制代码

@Data
@ApiModel(description = "索引库实体")
public class ItemDoc {
  @ApiModelProperty("商品id")
  private String id;
  @ApiModelProperty("商品名称")
  private String name;
  @ApiModelProperty("价格（分）")
  private Integer price;
  @ApiModelProperty("库存")
  private Integer stock;
  @ApiModelProperty("商品图片")
  private String image;
  @ApiModelProperty("类目名称")
  private String category;
  @ApiModelProperty("品牌名称")
  private String brand;
  @ApiModelProperty("销量")
  private Integer sold;
  @ApiModelProperty("评论数")
  private Integer commentCount;
  @ApiModelProperty("是否是推广广告，true/false")
  private Boolean isAD;
  @ApiModelProperty("更新时间")
  @JsonFormat(pattern = "yyyy-MM-dd HH:mm:ss", timezone = "GMT+8")
  private LocalDateTime updateTime;
}

测试代码

java 复制代码

@SpringBootTest
public class SearchTest {

  @Autowired
  private IItemService itemService;

  private RestClient restClient = null;
  private ElasticsearchTransport transport = null;
  private ElasticsearchClient esClient = null;

  {
    // 使用 RestClient 作为底层传输对象
    restClient = RestClient.builder(new HttpHost("192.168.101.68", 9200)).build();

    ObjectMapper objectMapper = new ObjectMapper();
    objectMapper.registerModule(new JavaTimeModule());

    // 使用 Jackson 作为 JSON 解析器
    transport = new RestClientTransport(restClient, new JacksonJsonpMapper(objectMapper));
  }

  // 实现后续操作
  // TODO

  @BeforeEach
  public void searchTest() {
    // 创建客户端
    esClient = new ElasticsearchClient(transport);
    System.out.println(esClient);
  }

  @AfterEach
  public void close() throws IOException {
    transport.close();
  }
}

后续代码放在代码的 TODO 处运行即可！！！

5.3 新增文档

java 复制代码

@Test
public void testAdd() throws IOException {
  Item item = itemService.getById(546872);
  ItemDoc itemDoc = BeanUtil.toBean(item, ItemDoc.class);
  IndexResponse response = esClient.index(
    // 指定索引名称
    i -> i.index("items")
    // 指定主键
    .id(itemDoc.getId())
    // 指定文档对象
    .document(itemDoc));

  System.out.println(response);
}

5.4 查询文档

java 复制代码

@Test
public void testGetSearch() throws IOException {
  GetResponse<ItemDoc> items = esClient.get(
    // 指定索引名称，查询条件：id=546872
    g -> g.index("items").id("546872"), 
    // 指定返回类型
    ItemDoc.class);
  System.out.println(items);
}

5.5 删除文档

java 复制代码

@Test
void testDeleteDocumentById() throws IOException {
    DeleteResponse response = esClient.delete(
      // 指定索引
      d -> d.index("items").id("100002644680")
    );
    String s = response.result().jsonValue();
    log.info("result:"+s);
}

5.6 修改文档

5.6.1 局部修改

java 复制代码

@Test
public void testUpdateById() throws IOException {
  Item item = new Item();
  // 新值
  item.setId(546872L);
  item.setPrice(27500);
  
  UpdateResponse<ItemDoc> items = esClient.update(
    // 指定索引，id = 546872 的数据改为 item 中的数据
    u -> u.index("items").id("546872").doc(item), 
    ItemDoc.class
  );

  System.out.println(items);
}

5.7 批量导入

将数据库中所有数据均添加到文档中

java 复制代码

@Test
public void testBatchDocument() throws Exception {
  int pageNum = 1;
  int pageSize = 1000;

  while (true) {
    // 1. 查询数据库中的数据，每次查询 1000 条导入
    Page<Item> page = Page.of(pageNum, pageSize);
    List<Item> items = itemService.page(page).getRecords();

    if (CollUtils.isEmpty(items)) { break; }

    // 2. 把获取到的数据转换为 ItemDoc 类型
    List<ItemDoc> itemDocs = BeanUtil.copyToList(items, ItemDoc.class);
    // 3. 创建 BulkRequest 对象，批量添加文档
    BulkRequest.Builder builder = new BulkRequest.Builder();
    // 4. 遍历数据，添加到 BulkRequest 对象中
    itemDocs.forEach(
      itemDoc -> builder.operations(
        b -> b.index(
          i -> i.index("items").id(itemDoc.getId()).document(itemDoc)
        )
      )
    );
    BulkRequest bulkRequest = builder.build();
    // 5. 让 ElasticsearchClient 执行 BulkRequest 对象，批量添加文档
    esClient.bulk(bulkRequest);

    System.out.println("第" + pageNum + "页数据导入完成");
    pageNum++;
  }
}