在之前的文章 "Elasticsearch 8.10 中引入查询规则 - query rules",我们详述了如何使用 query rules 来进行搜索。这个交互式笔记本将向你介绍如何使用官方 Elasticsearch Python 客户端来使用查询规则。 你将使用 query rules API 将查询规则存储在 Elasticsearch 中,并使用 rule_query 查询它们。
安装
安装 Elasticsearch 及 Kibana
如果你还没有安装好自己的 Elasticsearch 及 Kibana,那么请参考一下的文章来进行安装:
在安装的时候,请选择 Elastic Stack 8.x 进行安装。在安装的时候,我们可以看到如下的安装信息:
环境变量
在启动 Jupyter 之前,我们设置如下的环境变量:
ini
1. export ES_USER="elastic"
2. export ES_PASSWORD="xnLj56lTrH98Lf_6n76y"
3. export ES_ENDPOINT="localhost"
请在上面修改相应的变量的值。这个需要在启动 jupyter 之前运行。
拷贝 Elasticsearch 证书
我们把 Elasticsearch 的证书拷贝到当前的目录下:
bash
1. $ pwd
2. /Users/liuxg/python/elser
3. $ cp ~/elastic/elasticsearch-8.12.0/config/certs/http_ca.crt .
4. $ ls http_ca.crt
5. http_ca.crt
安装 Python 依赖包
python3 -m pip install -qU elasticsearch load_dotenv
准备数据
我们在项目当前的目录下创建如下的数据文件:
query-rules-data.json
css
1. [2. {3. "id": "us1",4. "content": {5. "name": "PureJuice Pro",6. "description": "PureJuice Pro: Experience the pinnacle of wireless charging. Blending rapid charging tech with sleek design, it ensures your devices are powered swiftly and safely. The future of charging is here.",7. "price": 15.00,8. "currency": "USD",9. "plug_type": "B",10. "voltage": "120v"11. }12. },13. {14. "id": "uk1",15. "content": {16. "name": "PureJuice Pro - UK Compatible",17. "description": "PureJuice Pro: Redefining wireless charging. Seamlessly merging swift charging capabilities with a refined aesthetic, it guarantees your devices receive rapid and secure power. Welcome to the next generation of charging.",18. "price": 20.00,19. "currency": "GBP",20. "plug_type": "G",21. "voltage": "230V"22. }23. },24. {25. "id": "eu1",26. "content": {27. "name": "PureJuice Pro - Wireless Charger suitable for European plugs",28. "description": "PureJuice Pro: Elevating wireless charging. Combining unparalleled charging speeds with elegant design, it promises both rapid and dependable energy for your devices. Embrace the future of wireless charging.",29. "price": 18.00,30. "currency": "EUR",31. "plug_type": "C",32. "voltage": "230V"33. }34. },35. {36. "id": "preview1",37. "content": {38. "name": "PureJuice Pro - Pre-order next version",39. "description": "Newest version of the PureJuice Pro wireless charger, coming soon! The newest model of the PureJuice Pro boasts a 2x faster charge than the current model, and a sturdier cable with an eighteen month full warranty. We also have a battery backup to charge on-the-go, up to two full charges. Pre-order yours today!",40. "price": 36.00,41. "currency": "USD",42. "plug_type": ["B", "C", "G"],
43. "voltage": ["230V", "120V"]
44. }
45. }
46. ]
创建应用并展示
我们在当前的目录下打入如下的命令来创建 notebook:
markdown
1. $ pwd
2. /Users/liuxg/python/elser
3. $ jupyter notebook
导入包及连接到 Elasticsearch
ini
1. from elasticsearch import Elasticsearch
2. from dotenv import load_dotenv
3. import os
5. load_dotenv()
7. openai_api_key=os.getenv('OPENAI_API_KEY')
8. elastic_user=os.getenv('ES_USER')
9. elastic_password=os.getenv('ES_PASSWORD')
10. elastic_endpoint=os.getenv("ES_ENDPOINT")
12. url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
13. client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
15. print(client.info())
索引一些测试数据
我们的客户端已设置并连接到我们的 Elastic 部署。 现在我们需要一些数据来测试 Elasticsearch 查询的基础知识。 我们将使用具有以下字段的小型产品索引:
name
description
price
currency
plug_type
voltage
运行以下命令上传一些示例数据:
python
1. import json
3. # Load data into a JSON object
4. with open('query-rules-data.json') as f:
5. docs = json.load(f)
7. operations = []
8. for doc in docs:
9. operations.append({"index": {"_index": "products_index", "_id": doc["id"]}})
10. operations.append(doc["content"])
11. client.bulk(index="products_index", operations=operations, refresh=True)
我们可以在 Kibana 中进行查看:
搜索测试数据
首先,让我们搜索数据寻找 "reliable wireless charger."。
在搜索数据之前,我们将定义一些方便的函数,将来自 Elasticsearch 的原始 JSON 响应输出为更易于理解的格式。
python
1. def pretty_response(response):
2. if len(response['hits']['hits']) == 0:
3. print('Your search returned no results.')
4. else:
5. for hit in response['hits']['hits']:
6. id = hit['_id']
7. score = hit['_score']
8. name = hit['_source']['name']
9. description = hit['_source']['description']
10. price = hit["_source"]["price"]
11. currency = hit["_source"]["currency"]
12. plug_type = hit["_source"]["plug_type"]
13. voltage = hit["_source"]["voltage"]
14. pretty_output = (f"\nID: {id}\nName: {name}\nDescription: {description}\nPrice: {price}\nCurrency: {currency}\nPlug type: {plug_type}\nVoltage: {voltage}\nScore: {score}")
15. print(pretty_output)
17. def pretty_ruleset(response):
18. print("Ruleset ID: " + response['ruleset_id'])
19. for rule in response['rules']:
20. rule_id = rule['rule_id']
21. type = rule['type']
22. print(f"\nRule ID: {rule_id}\n\tType: {type}\n\tCriteria:")
23. criteria = rule['criteria']
24. for rule_criteria in criteria:
25. criteria_type = rule_criteria['type']
26. metadata = rule_criteria['metadata']
27. values = rule_criteria['values']
28. print(f"\t\t{metadata} {criteria_type} {values}")
29. ids = rule['actions']['ids']
30. print(f"\tPinned ids: {ids}")
接下来,进行搜索
不使用 query rules 的正常搜索
vbscript
1. response = client.search(index="products_index", query={
2. "multi_match": {
3. "query": "reliable wireless charger for iPhone",
4. "fields": [ "name^5", "description" ]
5. }
6. })
8. pretty_response(response)
创建 query rules
我们分别假设,我们知道我们的用户来自哪个国家/地区(可能通过 IP 地址或登录的用户帐户信息进行地理位置定位)。 现在,我们希望创建查询规则,以便当人们搜索包含短语 "wireless charger (无线充电器)" 的任何内容时,根据该信息增强无线充电器的性能。
ini
1. client.query_ruleset.put(ruleset_id="promotion-rules", rules=[
2. {
3. "rule_id": "us-charger",
4. "type": "pinned",
5. "criteria": [
6. {
7. "type": "contains",
8. "metadata": "my_query",
9. "values": ["wireless charger"]
10. },
11. {
12. "type": "exact",
13. "metadata": "country",
14. "values": ["us"]
15. }
16. ],
17. "actions": {
18. "ids": [
19. "us1"
20. ]
21. }
22. },
23. {
24. "rule_id": "uk-charger",
25. "type": "pinned",
26. "criteria": [
27. {
28. "type": "contains",
29. "metadata": "my_query",
30. "values": ["wireless charger"]
31. },
32. {
33. "type": "exact",
34. "metadata": "country",
35. "values": ["uk"]
36. }
37. ],
38. "actions": {
39. "ids": [
40. "uk1"
41. ]
42. }
43. }
44. ])
为了使这些规则匹配,必须满足以下条件之一:
- my_query 包含字符串 "wireless charger" 并且 country "us"
- my_query 包含字符串 "wireless charger" 并且 country 为 "uk"
我们也可以使用 API 查看我们的规则集(使用另一个 Pretty_ruleset 函数以提高可读性):
vbscript
1. response = client.query_ruleset.get(ruleset_id="promotion-rules")
2. pretty_ruleset(response)
vbscript
1. response = client.search(index="products_index", query={
2. "rule_query": {
3. "organic": {
4. "multi_match": {
5. "query": "reliable wireless charger for iPhone",
6. "fields": [ "name^5", "description" ]
7. }
8. },
9. "match_criteria": {
10. "my_query": "reliable wireless charger for iPhone",
11. "country": "us"
12. },
13. "ruleset_id": "promotion-rules"
14. }
15. })
17. pretty_response(response)
整个 notebook 的源码可以在地址下载:github.com/liu-xiao-gu...