在之前的文章 "Elasticsearch:RBAC 和 RAG - 最好的朋友(一)",我们详细描述了如何使用 RBAC 来控制 RAG 的访问。在今天的文章中,我们来通过一个 jupyter notebook 来描述如何实现这个。
安装
如果你还没有安装好自己的 Elasticsearch 及 Kibana,请参考如下的链接来进行安装:
- 如何在 Linux,MacOS 及 Windows 上进行安装 Elasticsearch
- Kibana:如何在 Linux,MacOS 及 Windows上安装 Elastic 栈中的 Kibana
在安装的时候,我们选择 Elastic Stack 8.x 来进行安装。特别值得指出的是:ES|QL 只在 Elastic Stack 8.11 及以后得版本中才有。你需要下载 Elastic Stack 8.11 及以后得版本来进行安装。
在首次启动 Elasticsearch 的时候,我们可以看到如下的输出:
我们需要记下 Elasticsearch 超级用户 elastic 的密码。
我们还可以在安装 Elasticsearch 目录中找到 Elasticsearch 的访问证书:
bash
1. $ pwd
2. /Users/liuxg/elastic/elasticsearch-8.13.2/config/certs
3. $ ls
4. http.p12 http_ca.crt transport.p12
在上面,http_ca.crt 是我们需要用来访问 Elasticsearch 的证书。
我们首先克隆已经写好的代码:
bash
git clone https://github.com/liu-xiao-guo/elasticsearch-labs
我们然后进入到该项目的根目录下:
bash
1. $ pwd
2. /Users/liuxg/python/elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends
3. $ cp ~/elastic/elasticsearch-8.13.2/config/certs/http_ca.crt .
4. $ ls
5. http_ca.crt rbac-and-rag-best-friends.ipynb
在上面,我们把 Elasticsearch 的证书拷贝到当前的目录下。上面的 rbac-and-rag-best-friends.ipynb 就是我们下面要展示的 notebook。
展示
在运行 jupyter notebook 之前,我们先在命令行中打入如下的命令来设置变量:
ini
1. export ES_USER="elastic"
2. export ES_PASSWORD="VDMlz5QnM_0g-349fFq7"
3. export ES_ENDPOINT="localhost"
我们需要根据自己的配置做相应的改动。然后,我们在当前的 terminal 中打入如下的命令:
jupyter notebook
安装并导入需要的 Python 库
diff
!pip install elasticsearch python-dotenv
javascript
1. from elasticsearch import Elasticsearch
2. from IPython.display import HTML, display
3. from pprint import pprint
4. from dotenv import load_dotenv
5. import os, json
在运行完上面的命令后,我们可以查看安装好的 elasticsearch 包的版本:
markdown
1. $ pip list | grep elasticsearch
2. elasticsearch 8.13.0
客户端连接到 Elasticsearch
创建 elasticsearch 连接
ini
1. load_dotenv()
3. ES_USER = os.getenv("ES_USER")
4. ES_PASSWORD = os.getenv("ES_PASSWORD")
5. ES_ENDPOINT = os.getenv("ES_ENDPOINT")
7. url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
8. print(url)
10. es = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
11. print(es.info())
更多有关如何使用 Python 连接到 Elasticsearch 的知识,请参阅文章 "Elasticsearch:关于在 Python 中使用 Elasticsearch 你需要知道的一切 - 8.x"。
删除演示索引(如果以前存在)
python
1. # Delete indices
2. def delete_indices():
3. try:
4. es.indices.delete(index="rbac_rag_demo-data_public")
5. print("Deleted index: rbac_rag_demo-data_public")
6. except Exception as e:
7. print(f"Error deleting index rbac_rag_demo-data_public: {str(e)}")
9. try:
10. es.indices.delete(index="rbac_rag_demo-data_sensitive")
11. print("Deleted index: rbac_rag_demo-data_sensitive")
12. except Exception as e:
13. print(f"Error deleting index rbac_rag_demo-data_sensitive: {str(e)}")
16. delete_indices()
创建及装载数据到索引中
perl
1. # Create indices
2. def create_indices():
3. # Create data_public index
4. es.indices.create(
5. index="rbac_rag_demo-data_public",
6. ignore=400,
7. body={
8. "settings": {"number_of_shards": 1},
9. "mappings": {"properties": {"info": {"type": "text"}}},
10. },
11. )
13. # Create data_sensitive index
14. es.indices.create(
15. index="rbac_rag_demo-data_sensitive",
16. ignore=400,
17. body={
18. "settings": {"number_of_shards": 1},
19. "mappings": {
20. "properties": {
21. "document": {"type": "text"},
22. "confidentiality_level": {"type": "keyword"},
23. }
24. },
25. },
26. )
29. # Populate sample data
30. def populate_data():
31. # Public HR information
32. public_docs = [
33. {"title": "Annual leave policies updated.", "confidentiality_level": "low"},
34. {"title": "Remote work guidelines available.", "confidentiality_level": "low"},
35. {
36. "title": "Health benefits registration period starts next month.",
37. "confidentiality_level": "low",
38. },
39. ]
40. for doc in public_docs:
41. es.index(index="rbac_rag_demo-data_public", document=doc)
43. # Sensitive HR information
44. sensitive_docs = [
45. {
46. "title": "Executive compensation details Q2 2024.",
47. "confidentiality_level": "high",
48. },
49. {
50. "title": "Bonus payout structure for all levels.",
51. "confidentiality_level": "high",
52. },
53. {
54. "title": "Employee stock options plan details.",
55. "confidentiality_level": "high",
56. },
57. ]
58. for doc in sensitive_docs:
59. es.index(index="rbac_rag_demo-data_sensitive", document=doc)
62. create_indices()
63. populate_data()
我们可以在 Kibana 中使用如下的命令来查看索引:
创建两个具有不同访问级别的用户
ruby
1. # Create roles
2. def create_roles():
3. # Role for the engineer
4. es.security.put_role(
5. ,
6. body={
7. "indices": [
8. {"names": ["rbac_rag_demo-data_public"], "privileges": ["read"]}
9. ]
10. },
11. )
13. # Role for the manager
14. es.security.put_role(
15. ,
16. body={
17. "indices": [
18. {
19. "names": [
20. "rbac_rag_demo-data_public",
21. "rbac_rag_demo-data_sensitive",
22. ],
23. "privileges": ["read"],
24. }
25. ]
26. },
27. )
30. # Create users with respective roles
31. def create_users():
32. # User 'engineer'
33. es.security.put_user(
34. user,
35. body={
36. "password": "password123",
37. "roles": ["engineer_role"],
38. "full_name": "Engineer User",
39. },
40. )
42. # User 'manager'
43. es.security.put_user(
44. user,
45. body={
46. "password": "password123",
47. "roles": ["manager_role"],
48. "full_name": "Manager User",
49. },
50. )
53. create_roles()
54. create_users()
运行完上面的代码后,我们可以在 Kibana 中进行查看:
我们其实也可以使用 Kibana 的 UI 来创建这些用户及 role。你可以想象阅读文章 "Elasticsearch:用户安全设置"。
测试安全角色如何影响查询数据的能力
创建 helper 函数
用于查询每个用户的辅助函数和一些输出格式
ini
1. """
2. def get_es_connection(cid, username, password):
3. return Elasticsearch(cloud_id=cid, basic_auth=(username, password))
4. """
6. def get_es_connection(username, password):
7. url = f"https://{username}:{password}@{ES_ENDPOINT}:9200"
8. print(url)
9. return Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
12. def query_index(es, index_name, username):
13. try:
14. response = es.search(index=index_name, body={"query": {"match_all": {}}})
16. # Prepare the message
17. results_message = f'Results from querying as <span style="color: orange;">{username}:</span><br>'
18. for hit in response["hits"]["hits"]:
19. confidentiality_level = hit["_source"].get("confidentiality_level", "N/A")
20. index_name = hit.get("_index", "N/A")
21. title = hit["_source"].get("title", "No title")
23. # Set color based on confidentiality level
24. if confidentiality_level == "low":
25. conf_color = "lightgreen"
26. elif confidentiality_level == "high":
27. conf_color = "red"
28. else:
29. conf_color = "black"
31. # Set color based on index name
32. if index_name == "rbac_rag_demo-data_public":
33. index_color = "lightgreen"
34. elif index_name == "rbac_rag_demo-data_sensitive":
35. index_color = "red"
36. else:
37. index_color = "black" # Default color
39. results_message += (
40. f'Index: <span style="color: {index_color};">{index_name}</span>\t '
41. f'confidentiality level: <span style="color: {conf_color};">{confidentiality_level}</span> '
42. f'title: <span style="color: lightblue;">{title}</span><br>'
43. )
45. display(HTML(results_message))
47. except Exception as e:
48. print(f"Error accessing {index_name}: {str(e)}")
模拟 "工程师" 及 "经理" 的查询
swift
1. index_pattern = "rbac_rag_demo-data*"
2. print(
3. f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
4. )
6. for user in ["engineer", "manager"]:
7. print(f"Logged in as {user}:")
9. es_conn = get_es_connection(user, "password123")
10. results = query_index(es_conn, index_pattern, user)
11. print("\n\n")
swift
1. index_pattern = "rbac_rag_demo-data*"
2. print(
3. f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
4. )
6. for user in ["engineer", "manager"]:
7. print(f"Logged in as {user}:")
9. es_conn = get_es_connection(user, "password123")
10. results = query_index(es_conn, index_pattern, user)
11. print("\n\n")
从上面的输出中,我们可以看出来经理可以同时访问两个索引的数据,但是工程师只能访问属于工程师的数据。