Elasticsearch:使用查询规则(query rules)进行搜索

在之前的文章 "Elasticsearch 8.10 中引入查询规则 - query rules",我们详述了如何使用 query rules 来进行搜索。这个交互式笔记本将向你介绍如何使用官方 Elasticsearch Python 客户端来使用查询规则。 你将使用 query rules API 将查询规则存储在 Elasticsearch 中,并使用 rule_query 查询它们。

安装

安装 Elasticsearch 及 Kibana

如果你还没有安装好自己的 Elasticsearch 及 Kibana,那么请参考一下的文章来进行安装:

在安装的时候,请选择 Elastic Stack 8.x 进行安装。在安装的时候,我们可以看到如下的安装信息:

环境变量

在启动 Jupyter 之前,我们设置如下的环境变量:

ini 复制代码
1.  export ES_USER="elastic"
2.  export ES_PASSWORD="xnLj56lTrH98Lf_6n76y"
3.  export ES_ENDPOINT="localhost"

请在上面修改相应的变量的值。这个需要在启动 jupyter 之前运行。

拷贝 Elasticsearch 证书

我们把 Elasticsearch 的证书拷贝到当前的目录下:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elser
3.  $ cp ~/elastic/elasticsearch-8.12.0/config/certs/http_ca.crt .
4.  $ ls http_ca.crt 
5.  http_ca.crt

安装 Python 依赖包

python3 -m pip install -qU elasticsearch load_dotenv

准备数据

我们在项目当前的目录下创建如下的数据文件:

query-rules-data.json

css 复制代码
1.  [2.    {3.      "id": "us1",4.      "content": {5.        "name": "PureJuice Pro",6.        "description": "PureJuice Pro: Experience the pinnacle of wireless charging. Blending rapid charging tech with sleek design, it ensures your devices are powered swiftly and safely. The future of charging is here.",7.        "price": 15.00,8.        "currency": "USD",9.        "plug_type": "B",10.        "voltage": "120v"11.      }12.    },13.    {14.      "id": "uk1",15.      "content": {16.        "name": "PureJuice Pro - UK Compatible",17.        "description": "PureJuice Pro: Redefining wireless charging. Seamlessly merging swift charging capabilities with a refined aesthetic, it guarantees your devices receive rapid and secure power. Welcome to the next generation of charging.",18.        "price": 20.00,19.        "currency": "GBP",20.        "plug_type": "G",21.        "voltage": "230V"22.      }23.    },24.    {25.      "id": "eu1",26.      "content": {27.        "name": "PureJuice Pro - Wireless Charger suitable for European plugs",28.        "description": "PureJuice Pro: Elevating wireless charging. Combining unparalleled charging speeds with elegant design, it promises both rapid and dependable energy for your devices. Embrace the future of wireless charging.",29.        "price": 18.00,30.        "currency": "EUR",31.        "plug_type": "C",32.        "voltage": "230V"33.      }34.    },35.    {36.      "id": "preview1",37.      "content": {38.        "name": "PureJuice Pro - Pre-order next version",39.        "description": "Newest version of the PureJuice Pro wireless charger, coming soon! The newest model of the PureJuice Pro boasts a 2x faster charge than the current model, and a sturdier cable with an eighteen month full warranty. We also have a battery backup to charge on-the-go, up to two full charges. Pre-order yours today!",40.        "price": 36.00,41.        "currency": "USD",42.        "plug_type": ["B", "C", "G"],
43.        "voltage": ["230V", "120V"]
44.      }
45.    }
46.  ]

创建应用并展示

我们在当前的目录下打入如下的命令来创建 notebook:

markdown 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elser
3.  $ jupyter notebook

导入包及连接到 Elasticsearch

ini 复制代码
1.  from elasticsearch import Elasticsearch
2.  from dotenv import load_dotenv
3.  import os

5.  load_dotenv()

7.  openai_api_key=os.getenv('OPENAI_API_KEY')
8.  elastic_user=os.getenv('ES_USER')
9.  elastic_password=os.getenv('ES_PASSWORD')
10.  elastic_endpoint=os.getenv("ES_ENDPOINT")

12.  url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
13.  client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

15.  print(client.info())

索引一些测试数据

我们的客户端已设置并连接到我们的 Elastic 部署。 现在我们需要一些数据来测试 Elasticsearch 查询的基础知识。 我们将使用具有以下字段的小型产品索引:

  • name
  • description
  • price
  • currency
  • plug_type
  • voltage

运行以下命令上传一些示例数据:

python 复制代码
1.  import json

3.  # Load data into a JSON object
4.  with open('query-rules-data.json') as f:
5.     docs = json.load(f)

7.  operations = []
8.  for doc in docs:
9.      operations.append({"index": {"_index": "products_index", "_id": doc["id"]}})
10.      operations.append(doc["content"])
11.  client.bulk(index="products_index", operations=operations, refresh=True)

我们可以在 Kibana 中进行查看:

搜索测试数据

首先,让我们搜索数据寻找 "reliable wireless charger."。

在搜索数据之前,我们将定义一些方便的函数,将来自 Elasticsearch 的原始 JSON 响应输出为更易于理解的格式。

python 复制代码
1.  def pretty_response(response):
2.      if len(response['hits']['hits']) == 0:
3.          print('Your search returned no results.')
4.      else:
5.          for hit in response['hits']['hits']:
6.              id = hit['_id']
7.              score = hit['_score']
8.              name = hit['_source']['name']
9.              description = hit['_source']['description']
10.              price = hit["_source"]["price"]
11.              currency = hit["_source"]["currency"]
12.              plug_type = hit["_source"]["plug_type"]
13.              voltage = hit["_source"]["voltage"]
14.              pretty_output = (f"\nID: {id}\nName: {name}\nDescription: {description}\nPrice: {price}\nCurrency: {currency}\nPlug type: {plug_type}\nVoltage: {voltage}\nScore: {score}")
15.              print(pretty_output)

17.  def pretty_ruleset(response):
18.      print("Ruleset ID: " + response['ruleset_id'])
19.      for rule in response['rules']:
20.          rule_id = rule['rule_id']
21.          type = rule['type']
22.          print(f"\nRule ID: {rule_id}\n\tType: {type}\n\tCriteria:")
23.          criteria = rule['criteria']
24.          for rule_criteria in criteria:
25.              criteria_type = rule_criteria['type']
26.              metadata = rule_criteria['metadata']
27.              values = rule_criteria['values']
28.              print(f"\t\t{metadata} {criteria_type} {values}")
29.          ids = rule['actions']['ids']
30.          print(f"\tPinned ids: {ids}")

接下来,进行搜索

不使用 query rules 的正常搜索

vbscript 复制代码
1.  response = client.search(index="products_index", query={
2.      "multi_match": {
3.          "query": "reliable wireless charger for iPhone",
4.          "fields": [ "name^5", "description" ]
5.      }
6.  })

8.  pretty_response(response)

创建 query rules

我们分别假设,我们知道我们的用户来自哪个国家/地区(可能通过 IP 地址或登录的用户帐户信息进行地理位置定位)。 现在,我们希望创建查询规则,以便当人们搜索包含短语 "wireless charger (无线充电器)" 的任何内容时,根据该信息增强无线充电器的性能。

ini 复制代码
1.  client.query_ruleset.put(ruleset_id="promotion-rules", rules=[
2.      {
3.        "rule_id": "us-charger",
4.        "type": "pinned",
5.        "criteria": [
6.          {
7.            "type": "contains",
8.            "metadata": "my_query",
9.            "values": ["wireless charger"]
10.          },
11.          {
12.            "type": "exact",
13.            "metadata": "country",
14.            "values": ["us"]
15.          }
16.        ],
17.        "actions": {
18.          "ids": [
19.            "us1"
20.          ]
21.        }
22.      },
23.      {
24.        "rule_id": "uk-charger",
25.        "type": "pinned",
26.        "criteria": [
27.          {
28.            "type": "contains",
29.            "metadata": "my_query",
30.            "values": ["wireless charger"]
31.          },
32.          {
33.            "type": "exact",
34.            "metadata": "country",
35.            "values": ["uk"]
36.          }
37.        ],
38.        "actions": {
39.          "ids": [
40.            "uk1"
41.          ]
42.        }
43.      }
44.    ])

为了使这些规则匹配,必须满足以下条件之一:

  • my_query 包含字符串 "wireless charger" 并且 country "us"
  • my_query 包含字符串 "wireless charger" 并且 country 为 "uk"

我们也可以使用 API 查看我们的规则集(使用另一个 Pretty_ruleset 函数以提高可读性):

vbscript 复制代码
1.  response = client.query_ruleset.get(ruleset_id="promotion-rules")
2.  pretty_ruleset(response)
vbscript 复制代码
1.  response = client.search(index="products_index", query={
2.        "rule_query": {
3.            "organic": {
4.                "multi_match": {
5.                    "query": "reliable wireless charger for iPhone",
6.                    "fields": [ "name^5", "description" ]
7.                }
8.            },
9.            "match_criteria": {
10.              "my_query": "reliable wireless charger for iPhone",
11.              "country": "us"
12.            },
13.            "ruleset_id": "promotion-rules"
14.        }
15.  })

17.  pretty_response(response)

整个 notebook 的源码可以在地址下载:github.com/liu-xiao-gu...

相关推荐
hengzhepa3 小时前
ElasticSearch备考 -- Async search
大数据·学习·elasticsearch·搜索引擎·es
bubble小拾11 小时前
ElasticSearch高级功能详解与读写性能调优
大数据·elasticsearch·搜索引擎
不能放弃治疗12 小时前
重生之我们在ES顶端相遇第 18 章 - Script 使用(进阶)
elasticsearch
hengzhepa12 小时前
ElasticSearch备考 -- Search across cluster
学习·elasticsearch·搜索引擎·全文检索·es
Elastic 中国社区官方博客14 小时前
Elasticsearch:使用 LLM 实现传统搜索自动化
大数据·人工智能·elasticsearch·搜索引擎·ai·自动化·全文检索
慕雪华年15 小时前
【WSL】wsl中ubuntu无法通过useradd添加用户
linux·ubuntu·elasticsearch
Elastic 中国社区官方博客17 小时前
使用 Vertex AI Gemini 模型和 Elasticsearch Playground 快速创建 RAG 应用程序
大数据·人工智能·elasticsearch·搜索引擎·全文检索
alfiy18 小时前
Elasticsearch学习笔记(四) Elasticsearch集群安全配置一
笔记·学习·elasticsearch
alfiy19 小时前
Elasticsearch学习笔记(五)Elastic stack安全配置二
笔记·学习·elasticsearch
丶21361 天前
【大数据】Elasticsearch 实战应用总结
大数据·elasticsearch·搜索引擎