使用requests发请求操作Elasticsearch【一】

本文为博主原创，未经授权，严禁转载及使用。
本文链接：https://blog.csdn.net/zyooooxie/article/details/123730279
之前在测试环境查es数据，在用 Kibana；可下半年，因为某些原因就不能用了。我就想着用代码来搞了。简单分享下。
【实际这篇博客推迟发布N个月】
个人博客：https://blog.csdn.net/zyooooxie
【以下所有内容仅为个人项目经历，如有不同，纯属正常】
Elasticsearch

Elasticsearch是一个开源的分布式搜索和分析引擎，用于存储、搜索和分析大量的数据。它是基于Apache Lucene库构建的，并提供了一个简单的RESTful API，使用户可以轻松地进行数据索引、搜索和分析。
Elasticsearch具有以下特点：
分布式架构：Elasticsearch可以水平扩展，将数据分布在多个节点上，实现高可用性和性能的提升。
实时搜索：Elasticsearch支持实时索引和搜索，可以在毫秒级别内返回搜索结果。
多种查询方式：Elasticsearch提供了丰富的查询语法和功能，包括全文搜索、精确匹配、范围查询等。
多种数据类型支持：Elasticsearch支持多种数据类型的索引和搜索，包括文本、数字、日期、地理位置等。
分布式数据存储：Elasticsearch使用分片和副本机制来存储数据，保证数据的可靠性和高可用性。
实时分析：Elasticsearch提供了强大的聚合功能，可以对数据进行实时的统计和分析。
可扩展性：Elasticsearch可以与其他工具和框架集成，如Logstash、Kibana等，实现全面的数据处理和可视化。
总之，Elasticsearch是一个功能强大、易于使用和可扩展的搜索和分析引擎，适用于各种应用场景，包括日志分析、电子商务、实时监控等。
Search APIs

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search.html
python 复制代码
"""
@blog: https://blog.csdn.net/zyooooxie
@qq: 153132336
@email: zyooooxie@gmail.com
"""

import random
import time

import requests

from requests_toolbelt.utils import dump

from XXX.common_es import gl_es_auth, gl_es_host_new
from user_log import Log
from XXX.practice_es_2 import es_send_request

gl_fc_room = 'TESTM7eY-sPd4wxA'  # 被拆分成 多个token
gl_No_fc_room = 'TESTDbrMoRzVuuXg'

gl_index = 'ABC-data'
gl_type = '_doc'
gl_url = '/'.join([gl_es_host_new, gl_index, gl_type, '_search'])


# 返回结果中最重要的部分是 hits ，它包含 total 字段来表示匹配到的文档总数，并且一个 hits 数组包含所查询结果的前十个文档。
# 在 hits 数组中每个结果包含文档的 _index 、 _type 、 _id ，加上 _source 字段。这意味着我们可以直接从返回的搜索结果中使用整个文档。


# 在 Elasticsearch 中， 相关性得分 由一个浮点数进行表示，并在搜索结果中通过 _score 参数返回，默认排序是 _score 降序。
# _score ，它衡量了文档与查询的匹配程度。


# 如果不对某一特殊的索引或者类型做限制，就会搜索集群中的所有文档。Elasticsearch 转发搜索请求到每一个主分片或者副本分片，汇集查询出的前10个结果，并且返回给我们。


def test_Search_1():
    """
    query-string search 查询字符串
    :return:
    """
    # https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-search.html#search-search-api-query-params

    # GET /{index}/{type}/_search?q=XXX
    # 将查询本身赋值给参数 q=

    s1 = {'q': 'seq:5'}  # 返回 seq等于5 的所有文档
    s2 = {'q': '错对123'}  # 返回 包含 错对123 的所有文档（默认 十条）

    # Elasticsearch 接受 from 和 size 参数，size 显示应该返回的结果数量，默认是 10；from 显示应该跳过的初始结果数量，默认是 0
    s3 = {'q': 'text.content:对错123', 'size': 15, 'from': 10}

    s4 = {'q': 'text.content:(456 abc)', 'size': 15}  # 搜索 多个值

    s5 = {'q': 'seq:>99 text.content:错对无所谓'}  # 搜索 多个条件、值大于99

    # 使用 sort 参数 实现 按照字段的值排序
    s6 = {"size": 15, "from": 10, "sort": "seq:desc"}
    s7 = {"size": 15, "from": 10, "sort": "seq"}
    s8 = {"size": 15, "from": 10}

    s9 = {"q": "无所谓ABC", "size": 15, "sort": "seq"}  # 字段将会默认升序排序，而按照 _score 的值进行降序排序。

    Log.info('********')

    search_list = [s6, s7, s8, s9]
    # search_list = [s1, s2]
    # search_list = [s3, s4, s5]

    for s in search_list:
        # Basic认证
        res = requests.get(gl_url, auth=gl_es_auth, params=s)

        Log.info(dump.dump_all(res).decode('utf-8'))
        res.close()

        time.sleep(1)

        Log.error('🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀')

    else:
        # 没有指定任何查询的空搜索
        res = requests.get(gl_url, auth=gl_es_auth)

        Log.info(dump.dump_all(res).decode('utf-8'))


def test_Search_2():
    """
    sort排序  分页
    :return:
    """
    # https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-search.html#search-search-api-request-body

    # 按照字段的值排序，可以使用 sort 参数进行实现；

    s11 = {'sort': 'seq'}  # query不传
    s12 = {'sort': {'seq': {'order': 'desc'}}}

    # wm61TEST
    s13 = {'query': {'match': {'from': 'wm61TEST'.lower()}},
           'sort': {'seq': {'order': 'asc'}}}
    s14 = {'query': {'match': {'from': 'wm61TEST'.lower()}},
           'sort': {'seq': {'order': 'desc'}}}

    # 多级排序
    # 返回结果 首先按第一个条件排序，仅当结果集的第一个 sort 值完全相同时才会按照第二个条件进行排序，以此类推。
    s15 = {'sort': [{'msgTime': {"order": "desc"}}, {'seq': {"order": "asc"}}]}

    # 在每个结果中 有一个新的名为 sort 的元素，它包含了我们用于排序的值。
    # 其次 _score 和 max_score 字段都是 null 。 _score 不被计算, 因为它并没有用于排序。

    Log.info('********')

    # 使用 from 和 size 参数来分页
    s21 = {'size': 15, 'from': 10}
    s22 = {'query': {'match': {'from': 'wm61TEST'.lower()}}, 'size': 15, 'from': 10}
    s23 = {'size': 15, 'from': 10, 'sort': 'seq'}
    s24 = {'size': 15, 'from': 10, 'sort': {'seq': {"order": "desc"}},
           'query': {'regexp': {'from': 'wm61TEST.+'.lower()}}}

    Log.info('********')

    # search_list = [s11, s12, s13, s14]
    # search_list = [s15]
    search_list = [s21, s22, s23, s24]

    for sl in search_list:
        search_send_get_post(url=gl_url, auth=gl_es_auth, json_data=sl)


# https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-search.html
# https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl.html
def test_query_1():
    """
    空查询 match_all   match   term
    :return:
    """

    s0 = {}  # 空查询

    Log.info('********')

    # 查询表达式(Query DSL)
    # 只需将查询语句传递给 query 参数

    # GET /_search
    # {
    #     "query": YOUR_QUERY_HERE
    # }

    # 查询语句(Query clauses) 可以是如下形式:

    # 1.叶子语句（Leaf clauses） (就像 match 语句) 被用于将查询字符串和一个字段（或者多个字段）对比。
    # {
    #     QUERY_NAME: {
    #         ARGUMENT: VALUE,
    #         ARGUMENT: VALUE,...
    #     }
    # }

    # 2.复合(Compound) 语句 主要用于 合并其它查询语句。 比如，一个 bool 语句 允许在你需要的时候组合其它语句，
    # 无论是 must 匹配、 must_not 匹配还是 should 匹配，同时它可以包含不评分的过滤器（filters）

    Log.info('********')

    # match_all 查询简单的匹配所有文档

    s1 = {'query': {'match_all': {}}}

    Log.info('********')

    # match 查询

    # 如果在一个精确值的字段上使用它，例如数字、日期、布尔或者一个 not_analyzed的字符串字段，那么它将会精确匹配给定的值；
    # 如果你在一个全文字段上使用 match 查询，在执行查询前，它将用正确的分析器去分析 查询字符串；

    s11 = {'query': {"match": {"seq": 5}}}
    s12 = {'query': {"match": {"msgTime": 1702883838000}}}

    s13 = {'query': {"match": {"roomId": gl_fc_room}}}
    s14 = {'query': {"match": {"roomId": gl_No_fc_room}}}

    Log.info('********')

    # term 查询

    # term 查询被用于精确值匹配，这些精确值可能是数字、时间、布尔或者那些 not_analyzed 的字符串；
    # term 查询对于输入的文本不 分析 ，所以它将给定的值进行精确查询。

    s21 = {'query': {"term": {"seq": 5}}}  # 'seq': {'type': 'long'}
    s22 = {'query': {"term": {"msgTime": 1702883838000}}}  # 'msgTime': {'type': 'float'}

    s23 = {'query': {"term": {"roomId": gl_fc_room}}}
    s24 = {'query': {"term": {"roomId": gl_No_fc_room}}}

    Log.info('********')

    # Avoid using the term query for text fields.
    # To search text field values, use the match query instead.

    # To better search text fields, the match query also analyzes your provided search term before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term.

    s25 = {'query': {"term": {"roomId.keyword": gl_fc_room}}}
    s26 = {'query': {"term": {"roomId.keyword": gl_No_fc_room}}}

    Log.info('********')

    # wrcIqjVAAAXbgIHkkXtestAuO-cTEST
    test_1 = {'query': {"match": {"roomId": 'wrciqjvaaaxbgihkkxtestauo'}}}
    test_2 = {'query': {"match": {"roomId": 'wrcIqjVAAAXbgIHkkXtestAuO'}}}

    test_3 = {'query': {"term": {"roomId": 'wrciqjvaaaxbgihkkxtestauo'}}}
    test_4 = {'query': {"term": {"roomId": 'wrcIqjVAAAXbgIHkkXtestAuO'}}}

    Log.info('********')

    search_list = [s21, s22, s23, s24, s25, s26]
    # search_list = [s0, s1]
    # search_list = [s11, s12, s13, s14]
    search_list = [test_1, test_2, test_3, test_4]

    for sl in search_list:
        search_send_get_post(url=gl_url, auth=gl_es_auth, json_data=sl)


def test_query_2():
    """
    range   terms   constant_score
    :return:
    """

    # range 查询
    # 找出那些落在指定区间内的数字或者时间。
    # Returns documents that contain terms within a provided range.

    # 被允许的操作符如下：
    # gt 大于
    # gte 大于等于
    # lt 小于
    # lte 小于等于

    s11 = {'query': {"range": {"seq": {'lt': 22}}}}

    # 处理日期字段时， range 查询支持对 日期计算（date math） 进行操作
    # https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-range-query.html#ranges-on-dates
    s12 = {'query': {'range': {'ABC': {'lt': 'now-10h'}}}}
    s13 = {'query': {'range': {'ABC': {'lt': '2023-10-10 00:00:00', "gte": "2020-01-01T00:00:00"}}}}
    s14 = {'query': {'range': {'ABC': {'lt': '2023-10-10 00:00:00||+1M'}}}}  # 在某个日期后加上一个双管符号 (||) 并紧跟一个日期数学表达式

    Log.info('********')

    # terms 查询
    # Returns documents that contain one or more exact terms in a provided field.

    # terms 查询 允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值，那么这个文档满足条件；

    # 和 term 查询一样，terms 查询对于输入的文本不分析。它查询那些精确匹配的值（包括在大小写、重音、空格等方面的差异）。

    s21 = {'query': {'terms': {'seq': [10, 20, 30]}}}
    s22 = {'query': {'terms': {'text.content': ['123', 'abcDE', '你好']}}}

    Log.info('********')

    # constant_score 查询
    # 通常当查找一个精确值的时候，我们不希望对查询进行评分计算。只希望对文档进行包括或排除的计算，可以用 constant_score  来取代只有 filter 语句的 bool 查询

    s30 = {'query': {'term': {'seq': 5}}}
    s31 = {'query': {'bool': {'filter': {'term': {'seq': 5}}}}}
    s32 = {'query': {'constant_score': {'filter': {'term': {'seq': 5}}}}}

    # 将让所有文档 用 一个恒定分数（默认为 1 ）
    s33 = {'query': {'constant_score': {'filter': {'terms': {'seq': [5, 15, 25, 35]}}}}}
    s34 = {'query': {'constant_score': {'filter': {'range': {'seq': {'gt': 22}}}}}}

    Log.info('********')

    search_list = [s30, s31, s32, s33, s34]
    # search_list = [s21, s22]
    # search_list = [s11]

    for sl in search_list:
        search_send_get_post(url=gl_url, auth=gl_es_auth, json_data=sl)


def test_query_3():
    """
    match_phrase    multi_match     match的多词查询
    :return:
    """

    # 短语匹配 [彼此邻近搜索词]
    # The match_phrase query analyzes the text and creates a phrase query out of the analyzed text.

    # match_phrase 查询首先将查询字符串解析成一个词项列表，然后对这些词项进行搜索，但只保留那些包含 全部 搜索词项，且 位置 与搜索词项相同的文档。
    # 它匹配相对顺序一致的所有指定词语。

    # 一个被认定为和短语 quick brown fox 匹配的文档，必须满足以下这些要求：
    # quick 、 brown 和 fox 需要全部出现在域中。
    # brown 的位置应该比 quick 的位置大 1 。
    # fox 的位置应该比 quick 的位置大 2 。
    # 如果以上任何一个选项不成立，则该文档不能认定为匹配。

    s11 = {'query': {'match_phrase': {'text.content': '安全 你好'}}}  # ['安', '全', '你', '好']

    # 当一个字符串被分词后，这个分析器不但会返回一个词项列表，而且还会返回各词项在原始字符串中的 位置 或者顺序关系

    s12 = {'query': {'match_phrase': {'text.content': '希望 技。@术 你好#!！'}}}  # ['希', '望', '技', '术', '你', '好']

    Log.info('********')

    # multi_match 查询
    # 可以在多个字段上执行相同的 match 查询。
    # The multi_match query builds on the match query to allow multi-field queries

    # 字段名称可以用模糊匹配的方式给出：任何与模糊模式正则匹配的字段都会被包括在搜索条件中
    s21 = {'query': {'multi_match': {'query': 11, 'fields': '*eq'}}}  # "seq":{"type":"long"}

    # "msgId":{"type":"text"    "text":{"properties":{"content":{"type":"text",
    s22 = {'query': {'multi_match': {'query': 'ABc', 'fields': ['msgId', 'text.content']}}}

    # "msgTime":{"type":"long"}
    s23 = {'query': {'multi_match': {'query': 1702877305000, 'fields': ['msgTime', 'seq']}}}

    Log.info('********')

    # match 一次能搜索多个值

    # 任何文档只要 text.content 字段里包含 指定词项中的至少一个词 就能匹配，被匹配的词项越多，文档就越相关。
    s31 = {'query': {'match': {'text.content': '科技公司 123 你好 深圳'}}, 'size': 20}

    # match 查询还可以接受 operator 操作符作为输入参数，默认情况下该操作符是 or 。
    # 我们可以将它修改成 and 让所有指定词项都必须匹配
    s41 = {'query': {'match': {'text.content': {'query': '科技公司 123 你好 深圳 安全', 'operator': 'and'}}}, 'size': 500}

    Log.info('********')

    search_list = [s21, s22, s23]
    # search_list = [s31, s41]
    # search_list = [s11, s12]

    for sl in search_list:
        search_send_get_post(url=gl_url, auth=gl_es_auth, json_data=sl)


def test_query_4():
    """
    prefix  wildcard    regexp
    :return:
    """
    # prefix 、 wildcard 和 regexp 查询是基于词操作的，如果用它们来查询 analyzed 字段，它们会检查字段里面的每个词，而不是将字段作为整体来处理。

    Log.info('********')

    # prefix 前缀查询
    # Returns documents that contain a specific prefix in a provided field.

    # prefix 查询是一个词级别的底层的查询，它不会在搜索之前分析查询字符串，它假定传入前缀就正是要查找的前缀。
    # 默认状态下， prefix 查询不做相关度评分计算，它只是将所有匹配的文档返回，并为每条结果赋予评分值 1 。

    # 前缀越短所需访问的词越多。如果我们要以 W 作为前缀而不是 W1 ，那么就可能需要做千万次的匹配。

    # wm61TEST
    s11 = {'query': {'prefix': {'from': 'wm61TEST2AAA40'}}}
    s12 = {'query': {'prefix': {'from': 'wm61TEST2AAA40'.lower()}}}  # from存的时候 做了分词

    # 9dea20TESTb4dcafbe396a1aa7
    s13 = {'query': {'prefix': {'msgId': {'value': '9dea20TEST'}}}}

    Log.info('********')

    # wildcard 通配符查询
    # Returns documents that contain terms matching a wildcard pattern.

    # ? which matches any single character
    # * which can match zero or more characters, including an empty one
    # 它使用标准的 shell 通配符查询： ? 匹配任意1个字符， * 匹配 0或多个字符。

    s21 = {'query': {'wildcard': {'msgId': {'value': '9dea20TEST*'}}}}

    Log.info('********')

    # regexp 查询
    # 要避免使用左通配这样的模式匹配（如： *foo 或 .*foo 这样的正则式）
    # https://www.elastic.co/guide/en/elasticsearch/reference/7.17/regexp-syntax.html#regexp-standard-operators

    s22 = {'query': {'regexp': {'from': 'wm61TEST2AAA40.+'.lower()}}}

    Log.info('********')

    # search_list = [s11, s12, s13]
    search_list = [s21, s22]

    for sl in search_list:
        search_send_get_post(url=gl_url, auth=gl_es_auth, json_data=sl)


def test_query_5():
    """
    组合查询
    :return:
    """

    # bool 查询（将多查询 组合成 单一查询）
    # {
    #    "bool" : {
    #       "must" :     [],
    #       "should" :   [],
    #       "must_not" : [],
    #    }
    # }

    # 它接收以下参数：
    # must      文档 必须匹配这些条件才能被包含进来。【与 AND 等价】
    # must_not  文档 必须不匹配这些条件才能被包含进来。【与 NOT 等价】
    # should    文档 至少有一个语句要匹配。【与 OR 等价】

    # filter    【但它以不评分、过滤模式来进行。这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档。】

    # 要求 ：msgTime 等于 1703049007000.0 、seq 大于 140000
    s11 = {'query': {'bool': {

        'must':
            [{'term': {'msgTime': 1703049007000.0}},
             {'range': {'seq': {'gt': 140000}}}]
    }}}

    s12 = {'query': {'bool': {

        'must': {'match': {'msgTime': 1703049007000.0}},
        'filter': {'range': {'seq': {'gt': 140000}}}

    }}}  # range查询 已经移到 filter 语句

    Log.info('********')

    # 所有 must 语句必须匹配，所有 must_not 语句都必须不匹配，但有多少 should 语句应该匹配呢？
    # 默认情况下，没有 should 语句是必须匹配的，只有一个例外：那就是当没有 must 语句的时候，至少有一个 should 语句必须匹配。

    s21 = {'query': {'bool': {
        'must': {'term': {'msgTime': 1703049007000.0}},
        'must_not': {'range': {'seq': {'lte': 140000}}}
    }}}

    s31 = {'query': {'bool': {
        'must': {'term': {'msgTime': 1703049007000.0}},
        'must_not': {'range': {'seq': {'lte': 140000}}},
        'should': {'term': {'seq': 1}}
    }}}  # 有must，不会满足should

    s32 = {'query': {'bool': {
        'must': {'range': {'seq': {'gte': 140000}}},
        'should': {'terms': {'seq': [1, 11, 111]}}
    }}}  # 有must，不会满足should

    s40 = {'query': {'bool': {
        'must_not': {'range': {'seq': {'lte': 140000}}},
        'should': {'term': {'seq': 1}}
    }}}  # should必须满足，但 seq冲突

    s41 = {'query': {'bool': {
        'must_not': {'range': {'seq': {'lte': 140000}}},
        'should': {'terms': {'seq': [140050, 140040]}}
    }}}  # should必须满足，且 seq能 查到

    Log.info('********')

    # search_list = [s11, s12]
    search_list = [s21, s31, s32, s40, s41]

    for sl in search_list:
        search_send_get_post(url=gl_url, auth=gl_es_auth, json_data=sl)

    # 当进行精确值查找时，我们会使用过滤器（filters）。过滤器很重要，因为它们执行速度非常快，不会计算相关度（直接跳过了整个评分阶段）而且很容易被缓存。

def search_send_get_post(url, auth, json_data) -> dict:
    """

    :param url:
    :param auth:
    :param json_data:
    :return:
    """
    time.sleep(1)

    # 对于一个查询请求，Elasticsearch 的工程师偏向于使用 GET 方式，因为他们觉得它比 POST 能更好的描述信息检索（retrieving information）的行为。
    # 然而，因为带请求体的 GET 请求并不被广泛支持，所以 search API同时支持 POST 请求。

    if random.getrandbits(1):

        res = requests.get(url, auth=auth, json=json_data)

    else:
        res = requests.post(url, auth=auth, json=json_data)

    Log.info(dump.dump_all(res).decode('utf-8'))
    # Log.info(dump.dump_response(res).decode('utf-8'))
    # Log.info(res.request.__dict__)

    result = res.json()

    res.close()

    Log.error('🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀🚀')

    return result


def test_query_7():
    """
    exists  ids
    :return:
    """

    # Exists query
    # Returns documents that contain an indexed value for a field.
    s11 = {'query': {'exists': {'field': 'to89794'}}}

    s12 = {'query': {'exists': {'field': 'msgType'}}}

    Log.info('********')

    # IDs
    # Returns documents based on their IDs. This query uses document IDs stored in the _id field.
    s21 = {'query': {
        'ids': {'values': ['x2180064b79bc-test-11ee-TEST-afbe396a1aa7', 'xie76cdec3731-test-11ee-TEST-f6e750ebe190']}}}

    s22 = {'query': {
        'ids': {'values': ['xie76cdec3731-test-11ee-TEST-f6e750ebe190']}}}

    Log.info('********')

    # search_list = [s11, s12, ]
    search_list = [s21, s22, ]

    for sl in search_list:
        search_send_get_post(url=gl_url, auth=gl_es_auth, json_data=sl)


def test_search_validate(query: dict, index_: str = gl_index):
    """

    :param query:
    :param index_:
    :return:
    """

    # validate-query API 可以用来验证查询是否合法
    # The validate API allows you to validate a potentially expensive query without executing it.

    url1 = '/'.join([gl_es_host_new, '_validate/query'])

    # explain 参数可以提供更多关于查询不合法的信息；对于合法查询，使用 explain 参数将返回可读的描述；
    url2 = '/'.join([gl_es_host_new, '_validate/query?explain=true'])  # 查询到的每一个 index 都会返回对应的 explanation
    url2_ = '/'.join([gl_es_host_new, '_validate/query?explain'])

    url3 = '/'.join([gl_es_host_new, index_, '_validate/query?explain'])

    # query
    # (Optional, query object) Defines the search definition using the Query DSL.
    assert query.get('query') is not None

    for u in [url1, url2, url2_, url3, ]:
        es_send_request('get', u, data_dict=query)


if __name__ == '__main__':
    pass
本文链接：https://blog.csdn.net/zyooooxie/article/details/123730279
个人博客 https://blog.csdn.net/zyooooxie