Elasticsearch:RBAC 和 RAG - 最好的朋友 (二)

在之前的文章 "Elasticsearch:RBAC 和 RAG - 最好的朋友(一)",我们详细描述了如何使用 RBAC 来控制 RAG 的访问。在今天的文章中,我们来通过一个 jupyter notebook 来描述如何实现这个。

安装

如果你还没有安装好自己的 Elasticsearch 及 Kibana,请参考如下的链接来进行安装:

在安装的时候,我们选择 Elastic Stack 8.x 来进行安装。特别值得指出的是:ES|QL 只在 Elastic Stack 8.11 及以后得版本中才有。你需要下载 Elastic Stack 8.11 及以后得版本来进行安装。

在首次启动 Elasticsearch 的时候,我们可以看到如下的输出:

我们需要记下 Elasticsearch 超级用户 elastic 的密码。

我们还可以在安装 Elasticsearch 目录中找到 Elasticsearch 的访问证书:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/elastic/elasticsearch-8.13.2/config/certs
3.  $ ls 
4.  http.p12      http_ca.crt   transport.p12

在上面,http_ca.crt 是我们需要用来访问 Elasticsearch 的证书。

我们首先克隆已经写好的代码

bash 复制代码
git clone https://github.com/liu-xiao-guo/elasticsearch-labs

我们然后进入到该项目的根目录下:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends
3.  $ cp ~/elastic/elasticsearch-8.13.2/config/certs/http_ca.crt .
4.  $ ls
5.  http_ca.crt                     rbac-and-rag-best-friends.ipynb

在上面,我们把 Elasticsearch 的证书拷贝到当前的目录下。上面的 rbac-and-rag-best-friends.ipynb 就是我们下面要展示的 notebook。

展示

在运行 jupyter notebook 之前,我们先在命令行中打入如下的命令来设置变量:

ini 复制代码
1.  export ES_USER="elastic"
2.  export ES_PASSWORD="VDMlz5QnM_0g-349fFq7"
3.  export ES_ENDPOINT="localhost"

我们需要根据自己的配置做相应的改动。然后,我们在当前的 terminal 中打入如下的命令:

jupyter notebook

安装并导入需要的 Python 库

diff 复制代码
!pip install elasticsearch python-dotenv
javascript 复制代码
1.  from elasticsearch import Elasticsearch
2.  from IPython.display import HTML, display
3.  from pprint import pprint
4.  from dotenv import load_dotenv
5.  import os, json

在运行完上面的命令后,我们可以查看安装好的 elasticsearch 包的版本:

markdown 复制代码
1.  $ pip list | grep elasticsearch
2.  elasticsearch                 8.13.0

客户端连接到 Elasticsearch

创建 elasticsearch 连接

ini 复制代码
1.  load_dotenv()

3.  ES_USER = os.getenv("ES_USER")
4.  ES_PASSWORD = os.getenv("ES_PASSWORD")
5.  ES_ENDPOINT = os.getenv("ES_ENDPOINT")

7.  url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
8.  print(url)

10.  es = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
11.  print(es.info())

更多有关如何使用 Python 连接到 Elasticsearch 的知识,请参阅文章 "Elasticsearch:关于在 Python 中使用 Elasticsearch 你需要知道的一切 - 8.x"。

删除演示索引(如果以前存在)

python 复制代码
1.  # Delete indices
2.  def delete_indices():
3.      try:
4.          es.indices.delete(index="rbac_rag_demo-data_public")
5.          print("Deleted index: rbac_rag_demo-data_public")
6.      except Exception as e:
7.          print(f"Error deleting index rbac_rag_demo-data_public: {str(e)}")

9.      try:
10.          es.indices.delete(index="rbac_rag_demo-data_sensitive")
11.          print("Deleted index: rbac_rag_demo-data_sensitive")
12.      except Exception as e:
13.          print(f"Error deleting index rbac_rag_demo-data_sensitive: {str(e)}")

16.  delete_indices()

创建及装载数据到索引中

perl 复制代码
1.  # Create indices
2.  def create_indices():
3.      # Create data_public index
4.      es.indices.create(
5.          index="rbac_rag_demo-data_public",
6.          ignore=400,
7.          body={
8.              "settings": {"number_of_shards": 1},
9.              "mappings": {"properties": {"info": {"type": "text"}}},
10.          },
11.      )

13.      # Create data_sensitive index
14.      es.indices.create(
15.          index="rbac_rag_demo-data_sensitive",
16.          ignore=400,
17.          body={
18.              "settings": {"number_of_shards": 1},
19.              "mappings": {
20.                  "properties": {
21.                      "document": {"type": "text"},
22.                      "confidentiality_level": {"type": "keyword"},
23.                  }
24.              },
25.          },
26.      )

29.  # Populate sample data
30.  def populate_data():
31.      # Public HR information
32.      public_docs = [
33.          {"title": "Annual leave policies updated.", "confidentiality_level": "low"},
34.          {"title": "Remote work guidelines available.", "confidentiality_level": "low"},
35.          {
36.              "title": "Health benefits registration period starts next month.",
37.              "confidentiality_level": "low",
38.          },
39.      ]
40.      for doc in public_docs:
41.          es.index(index="rbac_rag_demo-data_public", document=doc)

43.      # Sensitive HR information
44.      sensitive_docs = [
45.          {
46.              "title": "Executive compensation details Q2 2024.",
47.              "confidentiality_level": "high",
48.          },
49.          {
50.              "title": "Bonus payout structure for all levels.",
51.              "confidentiality_level": "high",
52.          },
53.          {
54.              "title": "Employee stock options plan details.",
55.              "confidentiality_level": "high",
56.          },
57.      ]
58.      for doc in sensitive_docs:
59.          es.index(index="rbac_rag_demo-data_sensitive", document=doc)

62.  create_indices()
63.  populate_data()

我们可以在 Kibana 中使用如下的命令来查看索引:

创建两个具有不同访问级别的用户

ruby 复制代码
1.  # Create roles
2.  def create_roles():
3.      # Role for the engineer
4.      es.security.put_role(
5.          ,
6.          body={
7.              "indices": [
8.                  {"names": ["rbac_rag_demo-data_public"], "privileges": ["read"]}
9.              ]
10.          },
11.      )

13.      # Role for the manager
14.      es.security.put_role(
15.          ,
16.          body={
17.              "indices": [
18.                  {
19.                      "names": [
20.                          "rbac_rag_demo-data_public",
21.                          "rbac_rag_demo-data_sensitive",
22.                      ],
23.                      "privileges": ["read"],
24.                  }
25.              ]
26.          },
27.      )

30.  # Create users with respective roles
31.  def create_users():
32.      # User 'engineer'
33.      es.security.put_user(
34.          user,
35.          body={
36.              "password": "password123",
37.              "roles": ["engineer_role"],
38.              "full_name": "Engineer User",
39.          },
40.      )

42.      # User 'manager'
43.      es.security.put_user(
44.          user,
45.          body={
46.              "password": "password123",
47.              "roles": ["manager_role"],
48.              "full_name": "Manager User",
49.          },
50.      )

53.  create_roles()
54.  create_users()

运行完上面的代码后,我们可以在 Kibana 中进行查看:

我们其实也可以使用 Kibana 的 UI 来创建这些用户及 role。你可以想象阅读文章 "Elasticsearch:用户安全设置"。

测试安全角色如何影响查询数据的能力

创建 helper 函数

用于查询每个用户的辅助函数和一些输出格式

ini 复制代码
1.  """
2.  def get_es_connection(cid, username, password):
3.      return Elasticsearch(cloud_id=cid, basic_auth=(username, password))
4.  """

6.  def get_es_connection(username, password):
7.      url = f"https://{username}:{password}@{ES_ENDPOINT}:9200"
8.      print(url)
9.      return Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

12.  def query_index(es, index_name, username):
13.      try:
14.          response = es.search(index=index_name, body={"query": {"match_all": {}}})

16.          # Prepare the message
17.          results_message = f'Results from querying as <span style="color: orange;">{username}:</span><br>'
18.          for hit in response["hits"]["hits"]:
19.              confidentiality_level = hit["_source"].get("confidentiality_level", "N/A")
20.              index_name = hit.get("_index", "N/A")
21.              title = hit["_source"].get("title", "No title")

23.              # Set color based on confidentiality level
24.              if confidentiality_level == "low":
25.                  conf_color = "lightgreen"
26.              elif confidentiality_level == "high":
27.                  conf_color = "red"
28.              else:
29.                  conf_color = "black"

31.              # Set color based on index name
32.              if index_name == "rbac_rag_demo-data_public":
33.                  index_color = "lightgreen"
34.              elif index_name == "rbac_rag_demo-data_sensitive":
35.                  index_color = "red"
36.              else:
37.                  index_color = "black"  # Default color

39.              results_message += (
40.                  f'Index: <span style="color: {index_color};">{index_name}</span>\t '
41.                  f'confidentiality level: <span style="color: {conf_color};">{confidentiality_level}</span> '
42.                  f'title: <span style="color: lightblue;">{title}</span><br>'
43.              )

45.          display(HTML(results_message))

47.      except Exception as e:
48.          print(f"Error accessing {index_name}: {str(e)}")

模拟 "工程师" 及 "经理" 的查询

swift 复制代码
1.  index_pattern = "rbac_rag_demo-data*"
2.  print(
3.      f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
4.  )

6.  for user in ["engineer", "manager"]:
7.      print(f"Logged in as {user}:")

9.      es_conn = get_es_connection(user, "password123")
10.      results = query_index(es_conn, index_pattern, user)
11.      print("\n\n")
swift 复制代码
1.  index_pattern = "rbac_rag_demo-data*"
2.  print(
3.      f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
4.  )

6.  for user in ["engineer", "manager"]:
7.      print(f"Logged in as {user}:")

9.      es_conn = get_es_connection(user, "password123")
10.      results = query_index(es_conn, index_pattern, user)
11.      print("\n\n")

从上面的输出中,我们可以看出来经理可以同时访问两个索引的数据,但是工程师只能访问属于工程师的数据。

最终的源码在地址 elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends/rbac-and-rag-best-friends.ipynb at main · liu-xiao-guo/elasticsearch-labs · GitHub

相关推荐
孤水寒月19 分钟前
Git忽略文件.gitignore
git·elasticsearch
LKAI.7 小时前
搭建Elastic search群集
linux·运维·elasticsearch·搜索引擎
it噩梦20 小时前
elasticsearch中使用fuzzy查询
elasticsearch
喝醉酒的小白1 天前
Elasticsearch相关知识@1
大数据·elasticsearch·搜索引擎
小小工匠1 天前
ElasticSearch - 深入解析 Elasticsearch Composite Aggregation 的分页与去重机制
elasticsearch·composite·after_key·桶聚合分页
风_流沙1 天前
java 对ElasticSearch数据库操作封装工具类(对你是否适用嘞)
java·数据库·elasticsearch
TGB-Earnest1 天前
【py脚本+logstash+es实现自动化检测工具】
大数据·elasticsearch·自动化
woshiabc1111 天前
windows安装Elasticsearch及增删改查操作
大数据·elasticsearch·搜索引擎
arnold662 天前
探索 ElasticSearch:性能优化之道
大数据·elasticsearch·性能优化
成长的小牛2332 天前
es使用knn向量检索中numCandidates和k应该如何配比更合适
大数据·elasticsearch·搜索引擎