ElasticSearch学习笔记五:ES查询(一)

一、前言

上一篇文章我们学习了ES的基本操作和数据类型,接下来就是ES中比较重要的查询操作了,ES的出现就是为了解决搜索问题,正如他的标语 You Know For Search。当然搜索是一个很复杂的功能,我们也是循序渐进的学习,一开始会是一些比较简单的案例。

二、数据准备

1、创建索引

PUT /hotel
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text" 
      },
      "city":{
        "type": "keyword"
      },
      "price":{
        "type": "double"
      },
      "create_time":{
        "type": "date",
         "format": "yyyy-MM-dd HH:mm:ss"
      },
      "amenities":{
        "type": "text"
      },
      "full_room":{
        "type": "boolean"
      },
      "location":{
        "type": "geo_point"
      },
      "praise":{
        "type": "integer"
      }
    }
  }

说明:我们创建了一个索引,有以下几个字段

|-------------|-----------|-----------|
| 字段 | 类型 | 含义 |
| title | text | 标题 |
| city | keyword | 所在城市 |
| price | double | 价格 |
| create_time | date | 创建时间 |
| amenities | text | 便利设施 |
| full_room | boolean | 是否满房 |
| location | gen_point | 地理位置(经纬度) |
| praise | integer | 好评数量 |

2、写入数据,这里我们使用批量写入数据,使用bulk

POST /_bulk
{"index":{"_index":"hotel","_id":"001"}}
{"title":"文雅酒店","city":"青岛","price":556.00,"create_time":"2020-04-18 12:00:00","amenities":"浴池,普通停车场/充电停车场","full_room":false,"location":{"lat":36.083078,"lon":120.37566},"praise":10}
{"index":{"_index":"hotel","_id":"002"}}
{"title":"金都嘉怡假日酒店","city":"北京","price":337.00,"create_time":"2021-03015 20:00:00","amenities":"wifi,充电停车场/可升降停车场","full_room":false,"location":{"lat":39.915153,"lon":116.4030},"praise":60}
{"index":{"_index":"hotel","_id":"003"}}
{"itle":"金都欣欣酒店","city":"天津","price":200.00,"create_ime":"2021-05-09 16:00:00","amenities":"提供假日party,免费早餐,可充电停车场","full_room":true,"location":{"lat":39.186555,"lon":117.162007},"praise":30}
{"index":{"_index":"hotel","_id":"004"}}
{"title":"金都酒店","city":"北京","price":500.00,"create_time":"2021-02-18 08:00:00","amenities":"浴池(假日需预定),室内游泳池,普通停车场","full_room":true,"location":{"lat":39.915343,"lon":116.4239},"praise":20}
{"index":{"_index":"hotel","_id":"005"}}
{"title":"文雅精选酒店","city":"北京","price":800.00,"create_time":"2021-01-01 08:00:00","amenities":"浴池(假日需预定),wifi,室内游泳池,普通停车场","full_room":true,"location":{"lat":39.918229,"lon":116.422011},"praise":20}

3、我们先查看一下所有的数据

命令:GET /hotel/_search

结果:

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{
  "took" : 852,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "001",
        "_score" : 1.0,
        "_source" : {
          "title" : "文雅酒店",
          "city" : "青岛",
          "price" : 556.0,
          "create_time" : "2020-04-18 12:00:00",
          "amenities" : "浴池,普通停车场/充电停车场",
          "full_room" : false,
          "location" : {
            "lat" : 36.083078,
            "lon" : 120.37566
          },
          "praise" : 10
        }
      },
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "003",
        "_score" : 1.0,
        "_source" : {
          "itle" : "金都欣欣酒店",
          "city" : "天津",
          "price" : 200.0,
          "create_ime" : "2021-05-09 16:00:00",
          "amenities" : "提供假日party,免费早餐,可充电停车场",
          "full_room" : true,
          "location" : {
            "lat" : 39.186555,
            "lon" : 117.162007
          },
          "praise" : 30
        }
      },
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "004",
        "_score" : 1.0,
        "_source" : {
          "title" : "金都酒店",
          "city" : "北京",
          "price" : 500.0,
          "create_time" : "2021-02-18 08:00:00",
          "amenities" : "浴池(假日需预定),室内游泳池,普通停车场",
          "full_room" : true,
          "location" : {
            "lat" : 39.915343,
            "lon" : 116.4239
          },
          "praise" : 20
        }
      },
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "005",
        "_score" : 1.0,
        "_source" : {
          "title" : "文雅精选酒店",
          "city" : "北京",
          "price" : 800.0,
          "create_time" : "2021-01-01 08:00:00",
          "amenities" : "浴池(假日需预定),wifi,室内游泳池,普通停车场",
          "full_room" : true,
          "location" : {
            "lat" : 39.918229,
            "lon" : 116.422011
          },
          "praise" : 20
        }
      }
    ]
  }
}

三、开始查询

1、返回指定字段

很多场景我们并不需要返回所有的字段,比如在列表页的时候我们可能只需要返回 标题、价格而不需要把所的字段都查询出来,当然你也可以这么做,只不过这么做会带来一些性能上的损耗,这部分损耗包含从ES查询、从ES返回到客户端,从后端返回给前端等等,所以我们这里做指定字段的查询。使用到的关键字是**_source**

命令:

GET /hotel/_search
{
  "_source": ["title","price"]
}


结果:
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "001",
        "_score" : 1.0,
        "_source" : {
          "price" : 556.0,
          "title" : "文雅酒店"
        }
      },
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "003",
        "_score" : 1.0,
        "_source" : {
          "price" : 200.0
        }
      },
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "004",
        "_score" : 1.0,
        "_source" : {
          "price" : 500.0,
          "title" : "金都酒店"
        }
      },
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "005",
        "_score" : 1.0,
        "_source" : {
          "price" : 800.0,
          "title" : "文雅精选酒店"
        }
      }
    ]
  }
}

可以看到ES值返回了价格和标题,其余字段并没有返回。这个搜索就等价于Mysql中的
Select title,price from hotel
2、计数查询

有些场景我们只想知道有多少符合条件的数据,而不需要知道确定的数据是什么,此时就可以用ES提供的计数查询来做。例如我们想知道城市为北京的数据有多少条。

GET /hotel/_count
{
  "query": {
    "term": {
      "city": {
        "value": "北京"
      }
    }
  }
}


结果:
{
  "count" : 2,
  //下面这个是分片信息可以暂时忽略
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}


可以看到结果只返回了满足条件的数据的数量,而并没有返回具体是那两条数据。这里用了query 和 term
这些后面也会讲到。这个搜索等价于Mysql中的
SELECT count(*) FROM hotel WHERE city = '北京'
3、分页查询

分页查询,这个也很好理解,我们不可能一次性把所有数据都给客户,一方面是性能会很差,另一方面客户或许根本不关心这么多数据,所以此时我们需要将查询结果分页,让用户有选择的查看数据,话不多说直接上代码

GET /hotel/_search
{
  "from": 0,
  "size": 1,
  "query": {
    "term": {
      "city": {
        "value": "北京"
      }
    }
  }
}
结果:
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.87546873,
    "hits" : [
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "004",
        "_score" : 0.87546873,
        "_source" : {
          "title" : "金都酒店",
          "city" : "北京",
          "price" : 500.0,
          "create_time" : "2021-02-18 08:00:00",
          "amenities" : "浴池(假日需预定),室内游泳池,普通停车场",
          "full_room" : true,
          "location" : {
            "lat" : 39.915343,
            "lon" : 116.4239
          },
          "praise" : 20
        }
      }
    ]
  }
}



默认ES是只返回10条数据,我们可以通过from和size来设置分页参数,当ES默认最大返回值是
1000。当然这个数字是可以改的,在创建索引或者修改索引时设置settings index中有一个
max_result_window属性

PUT /hotel/_settings
{
  "index":{
    "max_result_window":2000
  }
}

这个搜索相当于Mysql中的
SELECT * FROM hotel WHERE city = '北京' limit 0,1

留个坑:ES会有深分页的问题,这个问题会放到后续的文章中讲述!!

4、性能分析

虽然ES是一个很强的搜索引擎,但是如果DSL写的过于抽象(过于烂)或者说索引设计的不合理也会导致ES搜索变慢,那么该如何解决呢?那肯定是要先知道那慢了,此时我们就可以用ES给我们提供的性能分析命令。我们还是以上面的DSL作为例子

DSL:
GET /hotel/_search
{
  "profile": true,
   "query": {
      "match": {
        "title": "金都"
      }
  }
}

结果:
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 2.0834165,
    "hits" : [
      {
        "_index" : "hotel",
        "_type" : "_doc",
        "_id" : "004",
        "_score" : 2.0834165,
        "_source" : {
          "title" : "金都酒店",
          "city" : "北京",
          "price" : 500.0,
          "create_time" : "2021-02-18 08:00:00",
          "amenities" : "浴池(假日需预定),室内游泳池,普通停车场",
          "full_room" : true,
          "location" : {
            "lat" : 39.915343,
            "lon" : 116.4239
          },
          "praise" : 20
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[dglsus9vTpGyHUlIqpKvlw][hotel][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "BooleanQuery",
                "description" : "title:金 title:都",
                "time_in_nanos" : 2043800,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 1,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 6700,
                  "match" : 4000,
                  "next_doc_count" : 1,
                  "score_count" : 1,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 77200,
                  "advance_count" : 1,
                  "score" : 13300,
                  "build_scorer_count" : 2,
                  "create_weight" : 139400,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1803200
                },
                "children" : [
                  {
                    "type" : "TermQuery",
                    "description" : "title:金",
                    "time_in_nanos" : 187000,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 3,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 1,
                      "compute_max_score_count" : 3,
                      "compute_max_score" : 36400,
                      "advance" : 1100,
                      "advance_count" : 2,
                      "score" : 7400,
                      "build_scorer_count" : 3,
                      "create_weight" : 37800,
                      "shallow_advance" : 12500,
                      "create_weight_count" : 1,
                      "build_scorer" : 91800
                    }
                  },
                  {
                    "type" : "TermQuery",
                    "description" : "title:都",
                    "time_in_nanos" : 43700,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 3,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 1,
                      "compute_max_score_count" : 3,
                      "compute_max_score" : 5400,
                      "advance" : 1800,
                      "advance_count" : 2,
                      "score" : 1000,
                      "build_scorer_count" : 3,
                      "create_weight" : 8400,
                      "shallow_advance" : 2400,
                      "create_weight_count" : 1,
                      "build_scorer" : 24700
                    }
                  }
                ]
              }
            ],
            "rewrite_time" : 7200,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 23900
              }
            ]
          }
        ],
        "aggregations" : [ ],
        "fetch" : {
          "type" : "fetch",
          "description" : "",
          "time_in_nanos" : 102400,
          "breakdown" : {
            "load_stored_fields" : 19700,
            "load_stored_fields_count" : 1,
            "next_reader" : 9000,
            "next_reader_count" : 1
          },
          "debug" : {
            "stored_fields" : [
              "_id",
              "_routing",
              "_source"
            ]
          },
          "children" : [
            {
              "type" : "FetchSourcePhase",
              "description" : "",
              "time_in_nanos" : 4600,
              "breakdown" : {
                "process_count" : 1,
                "process" : 3900,
                "next_reader" : 700,
                "next_reader_count" : 1
              },
              "debug" : {
                "fast_path" : 1
              }
            }
          ]
        }
      }
    ]
  }
}

这里的结果分析还是很冗长的,读起来也是有一点的难度的。索性我们可以借助kibana来帮我们分析

实话实说,笔者学到这里的时候还是很会看这个,后续再补充!

5、评分分析

ES是会对搜索条件进行评分的,如果用户没有指定按照那个字段进行排序,ES会使用自己的打分算法对文档进行排序,当然也可以人为干预,百度就是这么做的。有时候我们需要知道某个文档的具体打分详情,此时可以使用ES提供的explain来查询,例如

GET /hotel/_explain/002
{
  "query":{
    "match":{
      "title":"金都"
    }
  }
}

结果:
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{
  "_index" : "hotel",
  "_type" : "_doc",
  "_id" : "002",
  "matched" : true,
  "explanation" : {
    "value" : 1.1689311,
    "description" : "sum of:",
    "details" : [
      {
        "value" : 0.58446556,
        "description" : "weight(title:金 in 0) [PerFieldSimilarity], result of:",
        "details" : [
          {
            "value" : 0.58446556,
            "description" : "score(freq=1.0), computed as boost * idf * tf from:",
            "details" : [
              {
                "value" : 2.2,
                "description" : "boost",
                "details" : [ ]
              },
              {
                "value" : 0.6931472,
                "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details" : [
                  {
                    "value" : 2,
                    "description" : "n, number of documents containing term",
                    "details" : [ ]
                  },
                  {
                    "value" : 4,
                    "description" : "N, total number of documents with field",
                    "details" : [ ]
                  }
                ]
              },
              {
                "value" : 0.38327527,
                "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                "details" : [
                  {
                    "value" : 1.0,
                    "description" : "freq, occurrences of term within document",
                    "details" : [ ]
                  },
                  {
                    "value" : 1.2,
                    "description" : "k1, term saturation parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 0.75,
                    "description" : "b, length normalization parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 8.0,
                    "description" : "dl, length of field",
                    "details" : [ ]
                  },
                  {
                    "value" : 5.5,
                    "description" : "avgdl, average length of field",
                    "details" : [ ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value" : 0.58446556,
        "description" : "weight(title:都 in 0) [PerFieldSimilarity], result of:",
        "details" : [
          {
            "value" : 0.58446556,
            "description" : "score(freq=1.0), computed as boost * idf * tf from:",
            "details" : [
              {
                "value" : 2.2,
                "description" : "boost",
                "details" : [ ]
              },
              {
                "value" : 0.6931472,
                "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                "details" : [
                  {
                    "value" : 2,
                    "description" : "n, number of documents containing term",
                    "details" : [ ]
                  },
                  {
                    "value" : 4,
                    "description" : "N, total number of documents with field",
                    "details" : [ ]
                  }
                ]
              },
              {
                "value" : 0.38327527,
                "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                "details" : [
                  {
                    "value" : 1.0,
                    "description" : "freq, occurrences of term within document",
                    "details" : [ ]
                  },
                  {
                    "value" : 1.2,
                    "description" : "k1, term saturation parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 0.75,
                    "description" : "b, length normalization parameter",
                    "details" : [ ]
                  },
                  {
                    "value" : 8.0,
                    "description" : "dl, length of field",
                    "details" : [ ]
                  },
                  {
                    "value" : 5.5,
                    "description" : "avgdl, average length of field",
                    "details" : [ ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

这个过于复杂,现在了解即可

四、结束语

今天学习了ES中的查询,当然受限于篇幅,只是一部分。后续还会有更多的复杂搜索,希望对你有所帮助。

相关推荐
qq_q9922502775 分钟前
django基于django的民族服饰数据分析系统的设计与实现
后端·python·django
weixin_543662866 分钟前
BERT的中文问答系统32
python·深度学习·bert
Octopus207712 分钟前
【Linux】开发工具(yum)
linux·服务器·笔记·学习
禾乃儿_xiuer15 分钟前
《Python浪漫的烟花表白特效》
开发语言·python·表白·新手入门·表白代码
CT随15 分钟前
如何搭建一个vue2项目(完整步骤)
前端
木子七31 分钟前
vue2-脚手架
前端·vue
冰冻果冻40 分钟前
vue--响应式数据
前端·javascript·vue.js
青龙摄影41 分钟前
【识图】如何将本地的图片的数字或字母识别出来
服务器·前端·python
糊涂君-Q41 分钟前
Python小白学习教程从入门到入坑------习题课3(基础巩固)
python·学习·程序人生·职场和发展·学习方法·程序员创富·改行学it
橙某人1 小时前
Protocol Buffer (PB)协议阐述与实践
前端·javascript·架构