Elasticsearch:使用查询规则(query rules)进行搜索

在之前的文章 "Elasticsearch 8.10 中引入查询规则 - query rules",我们详述了如何使用 query rules 来进行搜索。这个交互式笔记本将向你介绍如何使用官方 Elasticsearch Python 客户端来使用查询规则。 你将使用 query rules API 将查询规则存储在 Elasticsearch 中,并使用 rule_query 查询它们。

安装

安装 Elasticsearch 及 Kibana

如果你还没有安装好自己的 Elasticsearch 及 Kibana,那么请参考一下的文章来进行安装:

在安装的时候,请选择 Elastic Stack 8.x 进行安装。在安装的时候,我们可以看到如下的安装信息:

环境变量

在启动 Jupyter 之前,我们设置如下的环境变量:

ini 复制代码
1.  export ES_USER="elastic"
2.  export ES_PASSWORD="xnLj56lTrH98Lf_6n76y"
3.  export ES_ENDPOINT="localhost"

请在上面修改相应的变量的值。这个需要在启动 jupyter 之前运行。

拷贝 Elasticsearch 证书

我们把 Elasticsearch 的证书拷贝到当前的目录下:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elser
3.  $ cp ~/elastic/elasticsearch-8.12.0/config/certs/http_ca.crt .
4.  $ ls http_ca.crt 
5.  http_ca.crt

安装 Python 依赖包

python3 -m pip install -qU elasticsearch load_dotenv

准备数据

我们在项目当前的目录下创建如下的数据文件:

query-rules-data.json

css 复制代码
1.  [2.    {3.      "id": "us1",4.      "content": {5.        "name": "PureJuice Pro",6.        "description": "PureJuice Pro: Experience the pinnacle of wireless charging. Blending rapid charging tech with sleek design, it ensures your devices are powered swiftly and safely. The future of charging is here.",7.        "price": 15.00,8.        "currency": "USD",9.        "plug_type": "B",10.        "voltage": "120v"11.      }12.    },13.    {14.      "id": "uk1",15.      "content": {16.        "name": "PureJuice Pro - UK Compatible",17.        "description": "PureJuice Pro: Redefining wireless charging. Seamlessly merging swift charging capabilities with a refined aesthetic, it guarantees your devices receive rapid and secure power. Welcome to the next generation of charging.",18.        "price": 20.00,19.        "currency": "GBP",20.        "plug_type": "G",21.        "voltage": "230V"22.      }23.    },24.    {25.      "id": "eu1",26.      "content": {27.        "name": "PureJuice Pro - Wireless Charger suitable for European plugs",28.        "description": "PureJuice Pro: Elevating wireless charging. Combining unparalleled charging speeds with elegant design, it promises both rapid and dependable energy for your devices. Embrace the future of wireless charging.",29.        "price": 18.00,30.        "currency": "EUR",31.        "plug_type": "C",32.        "voltage": "230V"33.      }34.    },35.    {36.      "id": "preview1",37.      "content": {38.        "name": "PureJuice Pro - Pre-order next version",39.        "description": "Newest version of the PureJuice Pro wireless charger, coming soon! The newest model of the PureJuice Pro boasts a 2x faster charge than the current model, and a sturdier cable with an eighteen month full warranty. We also have a battery backup to charge on-the-go, up to two full charges. Pre-order yours today!",40.        "price": 36.00,41.        "currency": "USD",42.        "plug_type": ["B", "C", "G"],
43.        "voltage": ["230V", "120V"]
44.      }
45.    }
46.  ]

创建应用并展示

我们在当前的目录下打入如下的命令来创建 notebook:

markdown 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elser
3.  $ jupyter notebook

导入包及连接到 Elasticsearch

ini 复制代码
1.  from elasticsearch import Elasticsearch
2.  from dotenv import load_dotenv
3.  import os

5.  load_dotenv()

7.  openai_api_key=os.getenv('OPENAI_API_KEY')
8.  elastic_user=os.getenv('ES_USER')
9.  elastic_password=os.getenv('ES_PASSWORD')
10.  elastic_endpoint=os.getenv("ES_ENDPOINT")

12.  url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
13.  client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

15.  print(client.info())

索引一些测试数据

我们的客户端已设置并连接到我们的 Elastic 部署。 现在我们需要一些数据来测试 Elasticsearch 查询的基础知识。 我们将使用具有以下字段的小型产品索引:

  • name
  • description
  • price
  • currency
  • plug_type
  • voltage

运行以下命令上传一些示例数据:

python 复制代码
1.  import json

3.  # Load data into a JSON object
4.  with open('query-rules-data.json') as f:
5.     docs = json.load(f)

7.  operations = []
8.  for doc in docs:
9.      operations.append({"index": {"_index": "products_index", "_id": doc["id"]}})
10.      operations.append(doc["content"])
11.  client.bulk(index="products_index", operations=operations, refresh=True)

我们可以在 Kibana 中进行查看:

搜索测试数据

首先,让我们搜索数据寻找 "reliable wireless charger."。

在搜索数据之前,我们将定义一些方便的函数,将来自 Elasticsearch 的原始 JSON 响应输出为更易于理解的格式。

python 复制代码
1.  def pretty_response(response):
2.      if len(response['hits']['hits']) == 0:
3.          print('Your search returned no results.')
4.      else:
5.          for hit in response['hits']['hits']:
6.              id = hit['_id']
7.              score = hit['_score']
8.              name = hit['_source']['name']
9.              description = hit['_source']['description']
10.              price = hit["_source"]["price"]
11.              currency = hit["_source"]["currency"]
12.              plug_type = hit["_source"]["plug_type"]
13.              voltage = hit["_source"]["voltage"]
14.              pretty_output = (f"\nID: {id}\nName: {name}\nDescription: {description}\nPrice: {price}\nCurrency: {currency}\nPlug type: {plug_type}\nVoltage: {voltage}\nScore: {score}")
15.              print(pretty_output)

17.  def pretty_ruleset(response):
18.      print("Ruleset ID: " + response['ruleset_id'])
19.      for rule in response['rules']:
20.          rule_id = rule['rule_id']
21.          type = rule['type']
22.          print(f"\nRule ID: {rule_id}\n\tType: {type}\n\tCriteria:")
23.          criteria = rule['criteria']
24.          for rule_criteria in criteria:
25.              criteria_type = rule_criteria['type']
26.              metadata = rule_criteria['metadata']
27.              values = rule_criteria['values']
28.              print(f"\t\t{metadata} {criteria_type} {values}")
29.          ids = rule['actions']['ids']
30.          print(f"\tPinned ids: {ids}")

接下来,进行搜索

不使用 query rules 的正常搜索

vbscript 复制代码
1.  response = client.search(index="products_index", query={
2.      "multi_match": {
3.          "query": "reliable wireless charger for iPhone",
4.          "fields": [ "name^5", "description" ]
5.      }
6.  })

8.  pretty_response(response)

创建 query rules

我们分别假设,我们知道我们的用户来自哪个国家/地区(可能通过 IP 地址或登录的用户帐户信息进行地理位置定位)。 现在,我们希望创建查询规则,以便当人们搜索包含短语 "wireless charger (无线充电器)" 的任何内容时,根据该信息增强无线充电器的性能。

ini 复制代码
1.  client.query_ruleset.put(ruleset_id="promotion-rules", rules=[
2.      {
3.        "rule_id": "us-charger",
4.        "type": "pinned",
5.        "criteria": [
6.          {
7.            "type": "contains",
8.            "metadata": "my_query",
9.            "values": ["wireless charger"]
10.          },
11.          {
12.            "type": "exact",
13.            "metadata": "country",
14.            "values": ["us"]
15.          }
16.        ],
17.        "actions": {
18.          "ids": [
19.            "us1"
20.          ]
21.        }
22.      },
23.      {
24.        "rule_id": "uk-charger",
25.        "type": "pinned",
26.        "criteria": [
27.          {
28.            "type": "contains",
29.            "metadata": "my_query",
30.            "values": ["wireless charger"]
31.          },
32.          {
33.            "type": "exact",
34.            "metadata": "country",
35.            "values": ["uk"]
36.          }
37.        ],
38.        "actions": {
39.          "ids": [
40.            "uk1"
41.          ]
42.        }
43.      }
44.    ])

为了使这些规则匹配,必须满足以下条件之一:

  • my_query 包含字符串 "wireless charger" 并且 country "us"
  • my_query 包含字符串 "wireless charger" 并且 country 为 "uk"

我们也可以使用 API 查看我们的规则集(使用另一个 Pretty_ruleset 函数以提高可读性):

vbscript 复制代码
1.  response = client.query_ruleset.get(ruleset_id="promotion-rules")
2.  pretty_ruleset(response)
vbscript 复制代码
1.  response = client.search(index="products_index", query={
2.        "rule_query": {
3.            "organic": {
4.                "multi_match": {
5.                    "query": "reliable wireless charger for iPhone",
6.                    "fields": [ "name^5", "description" ]
7.                }
8.            },
9.            "match_criteria": {
10.              "my_query": "reliable wireless charger for iPhone",
11.              "country": "us"
12.            },
13.            "ruleset_id": "promotion-rules"
14.        }
15.  })

17.  pretty_response(response)

整个 notebook 的源码可以在地址下载:github.com/liu-xiao-gu...

相关推荐
晨欣3 小时前
Elasticsearch和Lucene之间是什么关系?(ChatGPT回答)
elasticsearch·chatgpt·lucene
筱源源9 小时前
Elasticsearch-linux环境部署
linux·elasticsearch
Elastic 中国社区官方博客19 小时前
释放专利力量:Patently 如何利用向量搜索和 NLP 简化协作
大数据·数据库·人工智能·elasticsearch·搜索引擎·自然语言处理
Shenqi Lotus1 天前
ELK-ELK基本概念_ElasticSearch的配置
elk·elasticsearch
yeye198912241 天前
10-Query & Filtering 与多字符串多字段查询
elasticsearch
Narutolxy1 天前
精准优化Elasticsearch:磁盘空间管理与性能提升技巧20241106
大数据·elasticsearch·jenkins
谢小涛2 天前
ES管理工具Cerebro 0.8.5 Windows版本安装及启动
elasticsearch·es·cerebro
LKID体2 天前
Elasticsearch核心概念
大数据·elasticsearch·搜索引擎
晨欣2 天前
Elasticsearch里的索引index是什么概念?(ChatGPT回答)
大数据·elasticsearch·jenkins
许苑向上2 天前
最详细【Elasticsearch】Elasticsearch Java API + Spring Boot集成 实战入门(基础篇)
java·数据库·spring boot·elasticsearch