Elasticsearch:RBAC 和 RAG - 最好的朋友 (二)

在之前的文章 "Elasticsearch:RBAC 和 RAG - 最好的朋友(一)",我们详细描述了如何使用 RBAC 来控制 RAG 的访问。在今天的文章中,我们来通过一个 jupyter notebook 来描述如何实现这个。

安装

如果你还没有安装好自己的 Elasticsearch 及 Kibana,请参考如下的链接来进行安装:

在安装的时候,我们选择 Elastic Stack 8.x 来进行安装。特别值得指出的是:ES|QL 只在 Elastic Stack 8.11 及以后得版本中才有。你需要下载 Elastic Stack 8.11 及以后得版本来进行安装。

在首次启动 Elasticsearch 的时候,我们可以看到如下的输出:

我们需要记下 Elasticsearch 超级用户 elastic 的密码。

我们还可以在安装 Elasticsearch 目录中找到 Elasticsearch 的访问证书:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/elastic/elasticsearch-8.13.2/config/certs
3.  $ ls 
4.  http.p12      http_ca.crt   transport.p12

在上面,http_ca.crt 是我们需要用来访问 Elasticsearch 的证书。

我们首先克隆已经写好的代码

bash 复制代码
git clone https://github.com/liu-xiao-guo/elasticsearch-labs

我们然后进入到该项目的根目录下:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends
3.  $ cp ~/elastic/elasticsearch-8.13.2/config/certs/http_ca.crt .
4.  $ ls
5.  http_ca.crt                     rbac-and-rag-best-friends.ipynb

在上面,我们把 Elasticsearch 的证书拷贝到当前的目录下。上面的 rbac-and-rag-best-friends.ipynb 就是我们下面要展示的 notebook。

展示

在运行 jupyter notebook 之前,我们先在命令行中打入如下的命令来设置变量:

ini 复制代码
1.  export ES_USER="elastic"
2.  export ES_PASSWORD="VDMlz5QnM_0g-349fFq7"
3.  export ES_ENDPOINT="localhost"

我们需要根据自己的配置做相应的改动。然后,我们在当前的 terminal 中打入如下的命令:

jupyter notebook

安装并导入需要的 Python 库

diff 复制代码
!pip install elasticsearch python-dotenv
javascript 复制代码
1.  from elasticsearch import Elasticsearch
2.  from IPython.display import HTML, display
3.  from pprint import pprint
4.  from dotenv import load_dotenv
5.  import os, json

在运行完上面的命令后,我们可以查看安装好的 elasticsearch 包的版本:

markdown 复制代码
1.  $ pip list | grep elasticsearch
2.  elasticsearch                 8.13.0

客户端连接到 Elasticsearch

创建 elasticsearch 连接

ini 复制代码
1.  load_dotenv()

3.  ES_USER = os.getenv("ES_USER")
4.  ES_PASSWORD = os.getenv("ES_PASSWORD")
5.  ES_ENDPOINT = os.getenv("ES_ENDPOINT")

7.  url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
8.  print(url)

10.  es = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
11.  print(es.info())

更多有关如何使用 Python 连接到 Elasticsearch 的知识,请参阅文章 "Elasticsearch:关于在 Python 中使用 Elasticsearch 你需要知道的一切 - 8.x"。

删除演示索引(如果以前存在)

python 复制代码
1.  # Delete indices
2.  def delete_indices():
3.      try:
4.          es.indices.delete(index="rbac_rag_demo-data_public")
5.          print("Deleted index: rbac_rag_demo-data_public")
6.      except Exception as e:
7.          print(f"Error deleting index rbac_rag_demo-data_public: {str(e)}")

9.      try:
10.          es.indices.delete(index="rbac_rag_demo-data_sensitive")
11.          print("Deleted index: rbac_rag_demo-data_sensitive")
12.      except Exception as e:
13.          print(f"Error deleting index rbac_rag_demo-data_sensitive: {str(e)}")

16.  delete_indices()

创建及装载数据到索引中

perl 复制代码
1.  # Create indices
2.  def create_indices():
3.      # Create data_public index
4.      es.indices.create(
5.          index="rbac_rag_demo-data_public",
6.          ignore=400,
7.          body={
8.              "settings": {"number_of_shards": 1},
9.              "mappings": {"properties": {"info": {"type": "text"}}},
10.          },
11.      )

13.      # Create data_sensitive index
14.      es.indices.create(
15.          index="rbac_rag_demo-data_sensitive",
16.          ignore=400,
17.          body={
18.              "settings": {"number_of_shards": 1},
19.              "mappings": {
20.                  "properties": {
21.                      "document": {"type": "text"},
22.                      "confidentiality_level": {"type": "keyword"},
23.                  }
24.              },
25.          },
26.      )

29.  # Populate sample data
30.  def populate_data():
31.      # Public HR information
32.      public_docs = [
33.          {"title": "Annual leave policies updated.", "confidentiality_level": "low"},
34.          {"title": "Remote work guidelines available.", "confidentiality_level": "low"},
35.          {
36.              "title": "Health benefits registration period starts next month.",
37.              "confidentiality_level": "low",
38.          },
39.      ]
40.      for doc in public_docs:
41.          es.index(index="rbac_rag_demo-data_public", document=doc)

43.      # Sensitive HR information
44.      sensitive_docs = [
45.          {
46.              "title": "Executive compensation details Q2 2024.",
47.              "confidentiality_level": "high",
48.          },
49.          {
50.              "title": "Bonus payout structure for all levels.",
51.              "confidentiality_level": "high",
52.          },
53.          {
54.              "title": "Employee stock options plan details.",
55.              "confidentiality_level": "high",
56.          },
57.      ]
58.      for doc in sensitive_docs:
59.          es.index(index="rbac_rag_demo-data_sensitive", document=doc)

62.  create_indices()
63.  populate_data()

我们可以在 Kibana 中使用如下的命令来查看索引:

创建两个具有不同访问级别的用户

ruby 复制代码
1.  # Create roles
2.  def create_roles():
3.      # Role for the engineer
4.      es.security.put_role(
5.          ,
6.          body={
7.              "indices": [
8.                  {"names": ["rbac_rag_demo-data_public"], "privileges": ["read"]}
9.              ]
10.          },
11.      )

13.      # Role for the manager
14.      es.security.put_role(
15.          ,
16.          body={
17.              "indices": [
18.                  {
19.                      "names": [
20.                          "rbac_rag_demo-data_public",
21.                          "rbac_rag_demo-data_sensitive",
22.                      ],
23.                      "privileges": ["read"],
24.                  }
25.              ]
26.          },
27.      )

30.  # Create users with respective roles
31.  def create_users():
32.      # User 'engineer'
33.      es.security.put_user(
34.          user,
35.          body={
36.              "password": "password123",
37.              "roles": ["engineer_role"],
38.              "full_name": "Engineer User",
39.          },
40.      )

42.      # User 'manager'
43.      es.security.put_user(
44.          user,
45.          body={
46.              "password": "password123",
47.              "roles": ["manager_role"],
48.              "full_name": "Manager User",
49.          },
50.      )

53.  create_roles()
54.  create_users()

运行完上面的代码后,我们可以在 Kibana 中进行查看:

我们其实也可以使用 Kibana 的 UI 来创建这些用户及 role。你可以想象阅读文章 "Elasticsearch:用户安全设置"。

测试安全角色如何影响查询数据的能力

创建 helper 函数

用于查询每个用户的辅助函数和一些输出格式

ini 复制代码
1.  """
2.  def get_es_connection(cid, username, password):
3.      return Elasticsearch(cloud_id=cid, basic_auth=(username, password))
4.  """

6.  def get_es_connection(username, password):
7.      url = f"https://{username}:{password}@{ES_ENDPOINT}:9200"
8.      print(url)
9.      return Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

12.  def query_index(es, index_name, username):
13.      try:
14.          response = es.search(index=index_name, body={"query": {"match_all": {}}})

16.          # Prepare the message
17.          results_message = f'Results from querying as <span style="color: orange;">{username}:</span><br>'
18.          for hit in response["hits"]["hits"]:
19.              confidentiality_level = hit["_source"].get("confidentiality_level", "N/A")
20.              index_name = hit.get("_index", "N/A")
21.              title = hit["_source"].get("title", "No title")

23.              # Set color based on confidentiality level
24.              if confidentiality_level == "low":
25.                  conf_color = "lightgreen"
26.              elif confidentiality_level == "high":
27.                  conf_color = "red"
28.              else:
29.                  conf_color = "black"

31.              # Set color based on index name
32.              if index_name == "rbac_rag_demo-data_public":
33.                  index_color = "lightgreen"
34.              elif index_name == "rbac_rag_demo-data_sensitive":
35.                  index_color = "red"
36.              else:
37.                  index_color = "black"  # Default color

39.              results_message += (
40.                  f'Index: <span style="color: {index_color};">{index_name}</span>\t '
41.                  f'confidentiality level: <span style="color: {conf_color};">{confidentiality_level}</span> '
42.                  f'title: <span style="color: lightblue;">{title}</span><br>'
43.              )

45.          display(HTML(results_message))

47.      except Exception as e:
48.          print(f"Error accessing {index_name}: {str(e)}")

模拟 "工程师" 及 "经理" 的查询

swift 复制代码
1.  index_pattern = "rbac_rag_demo-data*"
2.  print(
3.      f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
4.  )

6.  for user in ["engineer", "manager"]:
7.      print(f"Logged in as {user}:")

9.      es_conn = get_es_connection(user, "password123")
10.      results = query_index(es_conn, index_pattern, user)
11.      print("\n\n")
swift 复制代码
1.  index_pattern = "rbac_rag_demo-data*"
2.  print(
3.      f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
4.  )

6.  for user in ["engineer", "manager"]:
7.      print(f"Logged in as {user}:")

9.      es_conn = get_es_connection(user, "password123")
10.      results = query_index(es_conn, index_pattern, user)
11.      print("\n\n")

从上面的输出中,我们可以看出来经理可以同时访问两个索引的数据,但是工程师只能访问属于工程师的数据。

最终的源码在地址 elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends/rbac-and-rag-best-friends.ipynb at main · liu-xiao-guo/elasticsearch-labs · GitHub

相关推荐
zfj3212 小时前
学技术学英文:elasticsearch 的数据类型
elasticsearch·数据类型·复杂数据类型
DavidSoCool6 小时前
es 3期 第25节-运用Rollup减少数据存储
大数据·elasticsearch·搜索引擎
Elastic 中国社区官方博客6 小时前
使用 Elasticsearch 导航检索增强生成图表
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
Elastic 中国社区官方博客9 小时前
设计新的 Kibana 仪表板布局以支持可折叠部分等
大数据·数据库·elasticsearch·搜索引擎·信息可视化·全文检索·kibana
Dusk_橙子19 小时前
在elasticsearch中,document数据的写入流程如何?
大数据·elasticsearch·搜索引擎
喝醉酒的小白21 小时前
Elasticsearch 中,分片(Shards)数量上限?副本的数量?
大数据·elasticsearch·jenkins
熟透的蜗牛1 天前
Elasticsearch 8.17.1 JAVA工具类
elasticsearch
九圣残炎1 天前
【ElasticSearch】 Java API Client 7.17文档
java·elasticsearch·搜索引擎
risc1234561 天前
【Elasticsearch】HNSW
elasticsearch
我的棉裤丢了1 天前
windows安装ES
大数据·elasticsearch·搜索引擎