Elasticsearch基础

核心概念

一、倒排索引原理

倒排索引又叫反向索引(inverted index),既然有反向索引那就有正向索引(forward index)了。

**正向索引:**当用户发起查询时(假设查询为一个关键词),搜索引擎会扫描索引库中的所有文档,找出所有包含关键词的文档,这样依次从文档中去查找是否含有关键词的方法叫做正向索引。

plain 复制代码
文档1的ID→单词1的信息;单词2的信息;单词3的信息...
文档2的ID→单词3的信息;单词2的信息;单词4的信息...

**反向索引:**搜索引擎会把正向索引变为反向索引(倒排索引)即把"文档→单词"的形式变为"单词→文档"的形式。

plain 复制代码
单词1→文档1的ID;文档2的ID;文档3的ID...
单词2→文档1的ID;文档4的ID;文档7的ID...

单词-文档矩阵:

plain 复制代码
D1:乔布斯去了中国。
D2:苹果今年仍能占据大多数触摸屏产能。
D3:苹果公司首席执行官史蒂夫·乔布斯宣布,iPad2将于3月11日在美国上市。
D4:乔布斯推动了世界,iPhone、iPad、iPad2,一款一款接连不断。
D5:乔布斯吃了一个苹果。
倒排索引

1、概念: 倒排索引是实现"单词-文档矩阵"的一种具体存储形式,通过倒排索引,可以根据单词快速获取包含这个单词的文档列表。倒排索引主要由两个部分组成:"单词词典"和"倒排文件"。

单词词典(Lexicon): 搜索引擎的通常索引单位是单词,单词词典是由文档集合中出现过的所有单词构成的字符串集合,单词词典内每条索引项记载单词本身的一些信息以及指向"倒排列表"的指针。

倒排列表(PostingList): 倒排列表记载了出现过某个单词的所有文档的文档列表及单词在该文档中出现的位置信息,每条记录称为一个倒排项(Posting)。根据倒排列表,即可获知哪些文档包含某个单词。

**倒排文件(Inverted File):**所有单词的倒排列表往往顺序地存储在磁盘的某个文件里,这个文件即被称之为倒排文件,倒排文件是存储倒排索引的物理文件。

2、倒排索引简单实例

plain 复制代码
Doc1:乔布斯去了中国。
Doc2:苹果今年仍能占据大多数触摸屏产能。
Doc3:苹果公司首席执行官史蒂夫·乔布斯宣布,iPad2将于3月11日在美国上市。
Doc4:乔布斯推动了世界,iPhone、iPad、iPad2,一款一款接连不断。
Doc5:乔布斯吃了一个苹果。

这5个文档建立简单的倒排索引:

假设这五个文档中的数字代表文档的ID,比如"Doc1"中的"1"。

单词ID(WordID) 单词(Word) 倒排列表(DocID)
1 乔布斯 1,3,4,5
2 苹果 2,3,5
3 iPad2 3,4
4 宣布 3
5 1,4,5
... ... ...

首先要用分词系统将文档自动切分成单词序列,这样就让文档转换为由单词序列构成的数据流,并对每个不同的单词赋予唯一的单词编号(WordID),并且每个单词都有对应的含有该单词的文档列表即倒排列表。

单词ID(WordID) 单词(Word) 倒排列表(DocID;TF;)(文档ID,单词频次,<单词位置>)
1 乔布斯 (1;1;<1>),(3;1;<6>),(4;1;<1>),(5;1;<1>)
2 苹果 (2;1;<1>),(3;1;<1>),(5;1;<5>)
3 iPad2 (3;1;<8>),(4;1;<7>)
4 宣布 (3;1;<7>)
5 (1;1;<3>),(4;1;<3>)(5;1;<3>)

二、网页、索引、类型(Type)的区别

概念 说明 版本变化
文档(Document) 数据基本单元,JSON格式存储(如一条用户记录) 始终存在
索引(Index) 文档的集合,包含字段定义(Mapping)和配置(Settings) 核心概念,持续存在
类型(Type) 旧版本中用于逻辑分类(类似数据库的"表分区"),Elasticsearch 7.x后已弃用 7.x默认移除,8.x彻底删除(建议使用独立索引替代)

三、分片(Shard)和副本(Replica)的作用

概念 作用 设计目标
分片(Shard) 1. 横向扩展 :将索引拆分为多个子集,分布到不同节点 2. 提升写入性能:并行处理数据 支持大数据量和高并发
副本(Replica) 1. 高可用 :主分片的复制品,防止数据丢失 2. 提升查询性能:负载均衡读请求 保障容灾能力和查询吞吐量

四、CRUD操作

CRUD操作与REST API

1、创建文档

语法:PUT /索引名/_doc/文档ID(指定ID)或 POST /索引名/_doc(自动生成ID)。

java 复制代码
PUT /blogs/_doc/1
{
    "title": "Elasticsearch实战应用",
    "author": "张三",
    "content": "Elasticsearch是一个分布式搜索引擎..."
}

PUT /blogs/_doc/2
{
    "title": "Elasticsearch理论基础",
    "author": "李四",
    "content": "Elasticsearch是一个分布式搜索引擎..."
}
java 复制代码
{
  "_index" : "blogs",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

2、查询文档

语法:GET /索引名/_doc/文档ID 或 GET /索引名/_search(全文检索)

java 复制代码
GET /blogs/_search
{
  "query": {"match": {"content": "Elasticsearch"}}
}


query:这是一个顶级字段,用于指定搜索的查询条件。
match:这是一个查询类型,表示使用"匹配查询"。match 查询会根据字段中的内容返回匹配的文档。
content:这是要搜索的字段名称,表示要在 content 字段中查找内容。
Elasticsearch:这是要搜索的实际内容,表示要在 content 字段中查找包含 Elasticsearch 这个关键字的文档。
java 复制代码
{
  "took" : 340,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.13353139,
    "hits" : [
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.13353139,
        "_source" : {
          "title" : "Elasticsearch实战应用",
          "author" : "张三",
          "content" : "Elasticsearch是一个分布式搜索引擎..."
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.13353139,
        "_source" : {
          "title" : "Elasticsearch理论基础",
          "author" : "李四",
          "content" : "Elasticsearch是一个分布式搜索引擎..."
        }
      }
    ]
  }
}

3、更新文档

全量更新:PUT /索引名/_doc/文档ID 覆盖原内容

部分更新:POST /索引名/_doc/文档ID/_update 使用doc字段修改

java 复制代码
POST /blogs/_doc/1/_update
{
  "doc": {"title": "Elasticsearch实战"}
}
java 复制代码
{
  "_index" : "blogs",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

4、删除文档

语法:DELETE /索引名/_doc/文档ID

java 复制代码
DELETE /blogs/_doc/2
java 复制代码
{
  "_index" : "blogs",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 6,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 7,
  "_primary_term" : 1
}
Bulk API批量操作

作用:单次请求处理多个增删改操作

java 复制代码
POST _bulk
{"index": {"_index": "logs", "_id": "1"}}
{"message": "日志1"}
{"index": {"_index": "logs", "_id": "2"}}
{"message": "日志2"}
{"delete": {"_index": "logs", "_id": "3"}}


index:表示一个索引操作,用于将文档添加到指定的索引中。
_index: "logs":指定要操作的索引为 logs。
_id: "1":指定文档的ID为 1。
{"message": "日志1"}:这是文档的内容,表示一个包含 message 字段的日志。


delete:表示一个删除操作,用于从指定的索引中删除指定ID的文档。
_index": "logs":指定要操作的索引为 logs。
_id": "3":指定要删除的文档ID为 3。
java 复制代码
GET /logs/_search
{
  "query": {"match_all": {}}
}
java 复制代码
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "logs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "message" : "日志1"
        }
      },
      {
        "_index" : "logs",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "message" : "日志2"
        }
      }
    ]
  }
}
搜索与查询

1、Query DSL结构

叶子查询

直接对字段进行查询,如 match、term、range 等,适用于单字段检索,支持全文或精确匹配。

java 复制代码
{"match": {"title": "Elasticsearch"}}
java 复制代码
在content字段中搜索包含Elasticsearch这个关键字的文档
GET /blogs/_search
{
  "query": {
    "match": {
      "content": "Elasticsearch"
    }
  }
}
java 复制代码
在logs索引中,查找timestamp字段值在2023年1月1日至2023年12月31日期间内的文档
GET /logs/_search
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "2023-01-01",
        "lte": "2023-12-31"
      }
    }
  }
}
java 复制代码
在特定字段中查找确切的值。该查询不会分析字段内容,适用于精确匹配。
GET /products/_search
{
  "query": {
    "term": {
      "category": "electronics"
    }
  }
}
java 复制代码
查找包含特定字段的文档。
在users索引中,查找包含email字段的文档
GET /users/_search
{
  "query": {
    "exists": {
      "field": "email"
    }
  }
}
java 复制代码
允许在字段中进行前缀匹配,同时支持多个词项。
在title字段中,查找包含生产和实战的文档。
GET /blogs/_search
{
  "query": {"match_bool_prefix": {
    "title": "生产 实战"
  }}
}


响应结果
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.5442266,
    "hits" : [
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.5442266,
        "_source" : {
          "title" : "Elasticsearch实战",
          "author" : "张三",
          "content" : "Elasticsearch是一个分布式搜索引擎..."
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "WEzTrJUBumy3Y6iJ6MEO",
        "_score" : 2.5153382,
        "_source" : {
          "title" : "Elasticsearch生产测试",
          "author" : "李四五",
          "content" : "Elasticsearch是一个分布式搜索引擎..."
        }
      }
    ]
  }
}
java 复制代码
在title字段中查找包含"开发"的文档
GET /blogs/_search
{
  "query": {"match_phrase": {
    "title": "开发"
  }}
}
java 复制代码
在特定字段中查找以指定前缀开头的值。
GET /blogs/_search
{
  "query": {
    "prefix": {
      "title.keyword": "生产"
    }
  }
}

复合查询

通过逻辑操作组合多个查询条件,如 bool、dis_max 等,支持 must(AND)、should(OR)、must_not(NOT)等逻辑运算。

布尔逻辑:

bool 查询:这是一个复合查询,允许我们组合多个查询使用布尔逻辑。

must (与)条件

java 复制代码
GET /books/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "Elasticsearch"  // 必须包含"如家"的文档
          }
        },
        {
          "term": {
            "author.keyword": "吴十一"    // 精确匹配城市为北京
          }
        }
      ]
    }
  }
}


响应数据
{
  -----
    "hits" : [
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 2.6632528,
        "_source" : {
          "title" : "Elasticsearch优化实践",
          "author" : "吴十一",
          "price" : 107.0,
          "publish_date" : "2023-09-09"
        }
      }
    ]
  }
}
java 复制代码
GET /books/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "Elasticsearch"
          }
        },
        {
          "term": {
            "price": 99.9
          }
        }
      ]
    }
  }
}



{
  ----
    "hits" : [
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.023975,
        "_source" : {
          "title" : "Elasticsearch实战",
          "author" : "张三",
          "price" : 99.9,
          "publish_date" : "2023-01-01"
        }
      },
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "21",
        "_score" : 1.023975,
        "_source" : {
          "title" : "Elasticsearch实战",
          "author" : "张三",
          "price" : 99.9,
          "publish_date" : "2023-01-01"
        }
      }
    ]
  }
}

must_not (非)条件

java 复制代码
GET /books/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          //排除价格10-129的文档
          "range": {
            "price": {
              "gte": 10,
              "lte": 129
            }
          }
        }
      ]
    }
  }
}



{
  ---
    "hits" : [
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "13",
        "_score" : 0.0,
        "_source" : {
          "title" : "Elasticsearch分布式架构",
          "author" : "赵十五",
          "price" : 130.0,
          "publish_date" : "2024-01-01"
        }
      }
    ]
  }
}

should 条件

这表示满足该条件的文档将在结果中获得更高的相关性评分,但不是强制要求。

java 复制代码
GET /books/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "price": {
              "value": 120,
              "boost": 2
            }
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "核心原理",
              "boost": 1   整体权重加倍
            }
          }
        }
      ],
      "minimum_should_match": 1  至少满足1个should条件
    }
  }
}

match_phrase:要求匹配的词语不仅要在字段中存在,还要按顺序出现

得到的结果不一定是或,

java 复制代码
{
  ----
    "hits" : [
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 9.41843,
        "_source" : {
          "title" : "Elasticsearch核心原理",
          "author" : "王五",
          "price" : 105.0,
          "publish_date" : "2023-03-03"
        }
      },
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.0,
        "_source" : {
          "title" : "深入理解Elasticsearch",
          "author" : "李四",
          "price" : 120.0,
          "publish_date" : "2023-02-02"
        }
      },
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "22",
        "_score" : 2.0,
        "_source" : {
          "title" : "深入理解Elasticsearch",
          "author" : "李四",
          "price" : 120.0,
          "publish_date" : "2023-02-02"
        }
      }
    ]
  }
}

filter 条件

filter 查询不评分,只检查是否匹配,从而提高查询效率。

java 复制代码
GET /books/_search
{
  "query": {
    "bool": {
      "filter": {
        "terms": {
          "price": [
            120,
            90
          ]
        }
      }
    }
  }
}
java 复制代码
"hits" : [
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "title" : "深入理解Elasticsearch",
          "author" : "李四",
          "price" : 120.0,
          "publish_date" : "2023-02-02"
        }
      },
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.0,
        "_source" : {
          "title" : "从零开始学Elasticsearch",
          "author" : "刘八",
          "price" : 90.0,
          "publish_date" : "2023-06-06"
        }
      },
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "22",
        "_score" : 0.0,
        "_source" : {
          "title" : "深入理解Elasticsearch",
          "author" : "李四",
          "price" : 120.0,
          "publish_date" : "2023-02-02"
        }
      }
    ]
java 复制代码
GET /books/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "price": {
            "gte": 85,
            "lte": 90
          }
        }
      }
    }
  }
}



"hits" : [
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.0,
        "_source" : {
          "title" : "从零开始学Elasticsearch",
          "author" : "刘八",
          "price" : 90.0,
          "publish_date" : "2023-06-06"
        }
      },
      {
        "_index" : "books",
        "_type" : "_doc",
        "_id" : "19",
        "_score" : 0.0,
        "_source" : {
          "title" : "Elasticsearch快速入门",
          "author" : "郑二十一",
          "price" : 85.0,
          "publish_date" : "2024-07-07"
        }
      }
    ]

2、分页与排序

分页:

浅分页:

java 复制代码
//适用于需要分页展示结果,通常用于用户界面的分页浏览。
GET /books/_search
{
  "query": {"match_all": {}},
  "from": 1,
  "size": 2
}

做过测试,越往后的分页,执行的效率越低。总体上会随着from的增加,消耗时间也会增加。而且数据量越大,就越明显!from+size查询在10000-50000条数据(1000到5000页)以内的时候还是可以的,但是如果数据过多的话,就会出现深分页问题。

深分页:

java 复制代码
//适用于需要逐批检索大量数据的情况,如数据导出、批量处理等
GET /books/_search?scroll=5m
{
  "query": {
    "match_all": {}
  },
  "size": 2
}


GET _search/scroll
{
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAD2iEWTC1ENkdnWHpSUGFFVU1Cc2Q2SmhqQQ",
  "scroll": "5m"
}

第一次查询响应会有_scroll_id,scroll=5m表示该_scroll_id时效性为5分钟,GET _search/scroll查询下一页的数据,一直请求一直翻页。

分页方式 性能 优点 缺点 场景
from + size 灵活性好,实现简单 深度分页问题 数据量比较小,能容忍深度分页问题
scroll 解决了深度分页问题 无法反应数据的实时性(快照版本)维护成本高,需要维护一个 scroll_id 海量数据的导出需要查询海量结果集的数据
search_after 性能最好不存在深度分页问题能够反映数据的实时变更 实现复杂,需要有一个全局唯一的字段连续分页的实现会比较复杂,因为每一次查询都需要上次查询的结果 海量数据的分页

排序:

java 复制代码
GET /books/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 200,  //es默认分页10条数据,所以这里自定义分页显示所以数据
//先price正序,同price内,publish_date倒序
  "sort": [
    {
      "price": "asc"
    },
    {
      "publish_date": "desc"
    }
  ]
}

五、索引

创建索引
java 复制代码
PUT /my_index {
  "settings": {
    "number_of_shards": 3,    // 主分片数(创建时固定)
    "number_of_replicas": 2  // 副本数(提升容灾能力)
  }
}

PUT /my_index_1
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "price": {
        "type": "float"
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd"
      }
    }
  }
}


GET /my_index_1
修改索引
java 复制代码
//修改设置
PUT /my_index_1/_settings
{
  "number_of_replicas": 2
}

//新增字段
PUT /my_index_1/_mapping(或_mapping)
{
  "properties": {
    "address1": { "type": "text" }
  }
}
迁移索引
java 复制代码
//创建一个索引
PUT /test
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "field1": { "type": "text" },
      "field2": { "type": "keyword" }
    }
  }
}


//添加文档
POST _bulk
{"index": {"_index": "test"}}
{"field1": "日志1","field2":23}
{"index": {"_index": "test"}}
{"field1": "日志2","field2":24}
{"index": {"_index": "test"}}
{"field1": "日志3","field2":25}
{"index": {"_index": "test"}}
{"field1": "日志4","field2":26}


//创建一个新索引
PUT /test2
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "field1": { "type": "text" },
      "field2": { "type": "keyword" }
    }
  }
}

//数据迁移(也理解为复制)
POST /_reindex
{
  "source": { "index": "test" },
  "dest": { "index": "test2" }
}


DELETE /test

六、聚合分析

指标聚合

定义:对数值字段进行数学运算,输出单一统计值。

Sum :计算字段总和、Avg :计算平均值、Min/Max :找极值、Stats:综合统计(含count、sum、min、max、avg)

java 复制代码
GET /books/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 110,
        "lte": 119
      }
    }
  },
  "aggs": {
    "price_sum": {
      "sum": {
        "field": "price"
      }
    },
    "price_avg":{
      "avg": {
        "field": "price"
      }
    },
    "price_max":{
      "max": {
        "field": "price"
      }
    },
    "price_min":{
      "min": {
        "field": "price"
      }
    },
    "price_stats":{
      "stats": {
        "field": "price"
      }
    }
  }
}
桶聚合

定义:按字段值或条件将文档分组到桶中,类似SQL的GROUP BY。

Terms :按字段值分组、Date Histogram :按时间间隔分组、Range:按数值范围分组。

java 复制代码
//from [   to  )
GET /books/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 110,
        "lte": 119
      }
    }
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ],
  "aggs": {
    "按标题分组": {
      "terms": {
        "field": "title.keyword",
        "size": 100
      }
    },
    "按年份分组": {
      "date_histogram": {
        "field": "publish_date",
        "interval": "year"
      },
      "aggs": {
        "按数值范围分组": {
          "range": {
            "field": "price",
            "ranges": [
              {
                "from": 110,
                "to": 113
              },
              {
                "from": 113,
                "to": 115
              },
              {
                "from": 115,
                "to": 119
              }
            ]
          },
          "aggs": {
            "按价格分组": {
              "terms": {
                "field": "price",
                "size": 100
              }
            }
          }
        }
      }
    }
  }
}
java 复制代码
{
 ---
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      ---
    ]
  },
  "aggregations" : {
    "按年份分组" : {
      "buckets" : [
        {
          "key_as_string" : "2023-01-01T00:00:00.000Z",
          "key" : 1672531200000,
          "doc_count" : 3,
          "按数值范围分组" : {
            "buckets" : [
              {
                "key" : "110.0-113.0",
                "from" : 110.0,
                "to" : 113.0,
                "doc_count" : 2,
                "按价格分组" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : 110.0,
                      "doc_count" : 2
                    }
                  ]
                }
              },
              {
                "key" : "113.0-115.0",
                "from" : 113.0,
                "to" : 115.0,
                "doc_count" : 0,
                "按价格分组" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [ ]
                }
              },
              {
                "key" : "115.0-119.0",
                "from" : 115.0,
                "to" : 119.0,
                "doc_count" : 1,
                "按价格分组" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : 115.0,
                      "doc_count" : 1
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "key_as_string" : "2024-01-01T00:00:00.000Z",
          "key" : 1704067200000,
          "doc_count" : 2,
          "按数值范围分组" : {
            "buckets" : [
              {
                "key" : "110.0-113.0",
                "from" : 110.0,
                "to" : 113.0,
                "doc_count" : 1,
                "按价格分组" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : 112.0,
                      "doc_count" : 1
                    }
                  ]
                }
              },
              {
                "key" : "113.0-115.0",
                "from" : 113.0,
                "to" : 115.0,
                "doc_count" : 0,
                "按价格分组" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [ ]
                }
              },
              {
                "key" : "115.0-119.0",
                "from" : 115.0,
                "to" : 119.0,
                "doc_count" : 1,
                "按价格分组" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : 118.0,
                      "doc_count" : 1
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    },
    "按标题分组" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Elasticsearch与云计算",
          "doc_count" : 1
        },
        {
          "key" : "Elasticsearch性能调优",
          "doc_count" : 1
        },
        {
          "key" : "Elasticsearch数据管理",
          "doc_count" : 1
        },
        {
          "key" : "Elasticsearch日志分析",
          "doc_count" : 1
        },
        {
          "key" : "Elasticsearch高级搜索",
          "doc_count" : 1
        }
      ]
    }
  }
}
嵌套聚合与排序

按指标聚合排序:

java 复制代码
GET /books/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 100,
        "lte": 120
      }
    }
  },
  "aggs": {
    "year_grouy": {
      "date_histogram": {
        "field": "publish_date",
        "interval": "year"
        , "order": {
          "price_sum": "asc"
        }
      },
      "aggs": {
        "price_sum": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}

复合排序:

java 复制代码
GET /books/_search
{

  "aggs": {
    "year_grouy": {
      "date_histogram": {
        "field": "publish_date",
        "interval": "year",
        "order": [
          {
            "price_sum": "asc"
          },
          {
            "price_avg": "asc"
          }
        ]
      },
      "aggs": {
        "price_sum": {
          "sum": {
            "field": "price"
          }
        },
        "price_avg": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

七、映射与建模

动态映射 vs 显式映射
类型 核心特性 适用场景
动态映射 自动推断字段类型,支持新字段自动添加(通过dynamic 参数控制行为) 6 快速索引未知结构数据,减少人工干预
显式映射 手动定义字段类型及属性(如分析器、索引选项),确保数据一致性 6 需严格数据模型控制的场景(如金融数据)
java 复制代码
动态映射可能导致字段类型冲突(如字符串被误判为数值),需通过dynamic参数限制。
显式映射需提前规划字段结构,但能优化存储和查询性能
字段数据类型(text, keyword, date)

Elasticsearch提供多种字段类型,核心类型包括:

Text

全文检索字段,自动分词(需配置分析器)。

示例:"title": {"type": "text"}

Keyword

精确匹配/聚合字段,不分词。

示例:"category": {"type": "keyword"}

Date

支持ISO 8601格式及时间戳,自动解析。

示例:"created_at": {"type": "date"}

复合类型

Object :嵌套JSON对象(非独立索引)。

Nested:独立索引数组元素,支持复杂查询。

全部代码

java 复制代码
PUT /blogs/_doc/1
{
  "title": "Elasticsearch实战应用",
  "author": "张三",
  "content": "Elasticsearch是一个分布式搜索引擎..."
}

PUT /blogs/_doc/2
{
  "title": "Elasticsearch理论基础",
  "author": "李四五",
  "content": "Elasticsearch是一个分布式搜索引擎..."
}
POST /blogs/_doc
{
  "title": "生产测试",
  "author": "李四五",
  "content": "Elasticsearch是一个分布式搜索引擎..."
}

POST /blogs/_doc
{
  "title": "生产测试Elasticsearch",
  "author": "李四五",
  "content": "Elasticsearch是一个分布式搜索引擎..."
}

GET /blogs/_search
{
  "query": {"match": {"title": "Elasticsearch"}}
}

GET /blogs/_search
{
  "query": {"match_all": {}}
}

POST /blogs/_doc/1/_update
{
  "doc": {"title": "Elasticsearch实战"}
}

DELETE /blogs/_doc/2



POST _bulk
{"index": {"_index": "logs", "_id": "1"}}
{"message": "日志1"}
{"index": {"_index": "logs", "_id": "2"}}
{"message": "日志2"}
{"delete": {"_index": "logs", "_id": "3"}}

GET /logs/_search
{
  "query": {"match_all": {}}
}

GET /logs/_search
{
  "query": {"match": {
    "message": "日志"
  }}
}

GET /blogs/_search
{
  "query": {"match_bool_prefix": {
    "title": "生产 实战"
  }}
}

GET /blogs/_search
{
  "query": {"match_phrase": {
    "title": "开发"
  }}
}


GET /blogs/_search
{
  "query": {
    "prefix": {
      "title.keyword": "Elasticsearch"
    }
  }
}


POST _bulk
{"index": {"_index": "books", "_id": "1"}}
{"title": "Elasticsearch实战", "author": "张三", "price": 99.9, "publish_date": "2023-01-01"}
{"index": {"_index": "books", "_id": "2"}}
{"title": "深入理解Elasticsearch", "author": "李四", "price": 120.0, "publish_date": "2023-02-02"}
{"index": {"_index": "books", "_id": "3"}}
{"title": "Elasticsearch核心原理", "author": "王五", "price": 105.0, "publish_date": "2023-03-03"}
{"index": {"_index": "books", "_id": "4"}}
{"title": "Elasticsearch高级搜索", "author": "赵六", "price": 110.0, "publish_date": "2023-04-04"}
{"index": {"_index": "books", "_id": "5"}}
{"title": "Elasticsearch集群管理", "author": "陈七", "price": 100.0, "publish_date": "2023-05-05"}
{"index": {"_index": "books", "_id": "6"}}
{"title": "从零开始学Elasticsearch", "author": "刘八", "price": 90.0, "publish_date": "2023-06-06"}
{"index": {"_index": "books", "_id": "7"}}
{"title": "Elasticsearch日志分析", "author": "孙九", "price": 115.0, "publish_date": "2023-07-07"}
{"index": {"_index": "books", "_id": "8"}}
{"title": "Elasticsearch与大数据", "author": "周十", "price": 125.0, "publish_date": "2023-08-08"}
{"index": {"_index": "books", "_id": "9"}}
{"title": "Elasticsearch优化实践", "author": "吴十一", "price": 107.0, "publish_date": "2023-09-09"}
{"index": {"_index": "books", "_id": "10"}}
{"title": "Elasticsearch高效开发", "author": "郑十二", "price": 108.0, "publish_date": "2023-10-10"}
{"index": {"_index": "books", "_id": "11"}}
{"title": "Elasticsearch搜索引擎", "author": "王十三", "price": 95.0, "publish_date": "2023-11-11"}
{"index": {"_index": "books", "_id": "12"}}
{"title": "Elasticsearch数据管理", "author": "李十四", "price": 110.0, "publish_date": "2023-12-12"}
{"index": {"_index": "books", "_id": "13"}}
{"title": "Elasticsearch分布式架构", "author": "赵十五", "price": 130.0, "publish_date": "2024-01-01"}
{"index": {"_index": "books", "_id": "14"}}
{"title": "Elasticsearch性能调优", "author": "陈十六", "price": 112.0, "publish_date": "2024-02-02"}
{"index": {"_index": "books", "_id": "15"}}
{"title": "Elasticsearch与云计算", "author": "刘十七", "price": 118.0, "publish_date": "2024-03-03"}
{"index": {"_index": "books", "_id": "16"}}
{"title": "Elasticsearch安全管理", "author": "孙十八", "price": 122.0, "publish_date": "2024-04-04"}
{"index": {"_index": "books", "_id": "17"}}
{"title": "Elasticsearch扩展与插件", "author": "周十九", "price": 125.0, "publish_date": "2024-05-05"}
{"index": {"_index": "books", "_id": "18"}}
{"title": "Elasticsearch最佳实践", "author": "吴二十", "price": 100.0, "publish_date": "2024-06-06"}
{"index": {"_index": "books", "_id": "19"}}
{"title": "Elasticsearch快速入门", "author": "郑二十一", "price": 85.0, "publish_date": "2024-07-07"}
{"index": {"_index": "books", "_id": "20"}}
{"title": "Elasticsearch高级指南", "author": "王二十二", "price": 128.0, "publish_date": "2024-08-08"}


POST _bulk
{"index": {"_index": "books"}}
{"title": "Elasticsearch实战", "author": "张三", "price": 99.9, "publish_date": "2025-03-01"}
{"index": {"_index": "books"}}
{"title": "深入理解Elasticsearch", "author": "李四", "price": 120.0, "publish_date": "2025-01-02"}

GET /books/_search

GET /books/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "Elasticsearch"
          }
        },
        {
          "term": {
            "price": 99.9
          }
        }
      ]
    }
  }
}

GET /books/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "range": {
            "price": {
              "gte": 10,
              "lte": 129
            }
          }
        }
      ]
    }
  }
}

GET /books/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "price": {
              "value": 120,
              "boost": 2
            }
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "核心原理",
              "boost": 1
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}


GET /books/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "price": {
            "gte": 85,
            "lte": 90
          }
        }
      }
    }
  }
}


GET /books/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {"title": "E"}
      },
      "functions": [
        {
          "filter": {
            "term": {
              "price": 120
            }
          }
        }
      ]
    }
  }
}

GET /books/_search
{
  "query": {"match_all": {}},
  "from": 0,
  "size": 2
}

GET /books/_search
{
  "query": {"match_all": {}},
  "from": 1,
  "size": 2
}

GET /books/_search
{
  "query": {"match_all": {}}
}


GET /books/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 200,
  "sort": [
    {
      "price": "asc"
    },
    {
      "publish_date": "desc"
    }
  ]
}

GET /books/_search?scroll=5m
{
  "query": {
    "match_all": {}
  },
  "size": 2
}


GET _search/scroll
{
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAD2iEWTC1ENkdnWHpSUGFFVU1Cc2Q2SmhqQQ",
  "scroll": "5m"
}

GET /books/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 110,
        "lte": 119
      }
    }
  },
  "aggs": {
    "price_sum": {
      "sum": {
        "field": "price"
      }
    },
    "price_avg":{
      "avg": {
        "field": "price"
      }
    },
    "price_max":{
      "max": {
        "field": "price"
      }
    },
    "price_min":{
      "min": {
        "field": "price"
      }
    },
    "price_stats":{
      "stats": {
        "field": "price"
      }
    }
  }
}

GET /books/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 110,
        "lte": 119
      }
    }
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ],
  "aggs": {
    "按标题分组": {
      "terms": {
        "field": "title.keyword",
        "size": 100
      }
    },
    "按年份分组": {
      "date_histogram": {
        "field": "publish_date",
        "interval": "year"
      },
      "aggs": {
        "按数值范围分组": {
          "range": {
            "field": "price",
            "ranges": [
              {
                "from": 110,
                "to": 113
              },
              {
                "from": 113,
                "to": 115
              },
              {
                "from": 115,
                "to": 119
              }
            ]
          },
          "aggs": {
            "按价格分组": {
              "terms": {
                "field": "price",
                "size": 100
              }
            }
          }
        }
      }
    }
  }
}


GET /books/_search
{
  
  "aggs": {
    "year_grouy": {
      "date_histogram": {
        "field": "publish_date",
        "interval": "year",
        "order": [
          {
            "price_sum": "asc"
          },
          {
            "price_avg": "asc"
          }
        ]
      },
      "aggs": {
        "price_sum": {
          "sum": {
            "field": "price"
          }
        },
        "price_avg": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}



PUT /ceshi1
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}


PUT /my_index_1
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "price": {
        "type": "float"
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

GET /my_index_1



GET /ceshi1/_mapping


PUT /my_index_1/_settings
{
  "number_of_replicas": 2
}

PUT /my_index_1/_mapping(或_mapping)
{
  "properties": {
    "address1": { "type": "text" }
  }
}


PUT /test
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "field1": { "type": "text" },
      "field2": { "type": "keyword" }
    }
  }
}

POST _bulk
{"index": {"_index": "test"}}
{"field1": "日志1","field2":23}
{"index": {"_index": "test"}}
{"field1": "日志2","field2":24}
{"index": {"_index": "test"}}
{"field1": "日志3","field2":25}
{"index": {"_index": "test"}}
{"field1": "日志4","field2":26}

PUT /test2
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "field1": { "type": "text" },
      "field2": { "type": "keyword" }
    }
  }
}

POST /_reindex
{
  "source": { "index": "test" },
  "dest": { "index": "test2" }
}


DELETE /test

GET /test/_search
相关推荐
zxhl09271 小时前
Kafka--常见问题
大数据·分布式·kafka
Elasticsearch3 小时前
Elastic 和 Tines 合作提供 SOAR 和 AIOps
elasticsearch
白衣神棍4 小时前
如何借助es的snapshot跨集群迁移部分索引
elasticsearch
Chanson4 小时前
SparkSQL常见语法的实现原理
大数据
豪越大豪6 小时前
豪越科技:融合低空经济的消防一体化安全管控解决方案
大数据·运维·安全
zru_96027 小时前
Spring Boot 集成 Elasticsearch怎样在不启动es的情况下正常启动服务
spring boot·后端·elasticsearch
youka1509 小时前
大数据学习栈记——HBase操作(shell & java)
大数据·学习·hbase
三块钱079411 小时前
【原创】通过S3接口将海量文件索引导入elasticsearch
大数据·elasticsearch·搜索引擎·go
观测云16 小时前
使用外部事件检测接入 CDH 大数据管理平台告警
大数据·告警
Faith_xzc16 小时前
存算分离是否真的有必要?从架构之争到 Doris 实战解析
大数据·数据库·数据仓库·架构·开源