Elasticsearch:RBAC 和 RAG - 最好的朋友 (二)

在之前的文章 "Elasticsearch:RBAC 和 RAG - 最好的朋友(一)",我们详细描述了如何使用 RBAC 来控制 RAG 的访问。在今天的文章中,我们来通过一个 jupyter notebook 来描述如何实现这个。

安装

如果你还没有安装好自己的 Elasticsearch 及 Kibana,请参考如下的链接来进行安装:

在安装的时候,我们选择 Elastic Stack 8.x 来进行安装。特别值得指出的是:ES|QL 只在 Elastic Stack 8.11 及以后得版本中才有。你需要下载 Elastic Stack 8.11 及以后得版本来进行安装。

在首次启动 Elasticsearch 的时候,我们可以看到如下的输出:

我们需要记下 Elasticsearch 超级用户 elastic 的密码。

我们还可以在安装 Elasticsearch 目录中找到 Elasticsearch 的访问证书:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/elastic/elasticsearch-8.13.2/config/certs
3.  $ ls 
4.  http.p12      http_ca.crt   transport.p12

在上面,http_ca.crt 是我们需要用来访问 Elasticsearch 的证书。

我们首先克隆已经写好的代码

bash 复制代码
git clone https://github.com/liu-xiao-guo/elasticsearch-labs

我们然后进入到该项目的根目录下:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends
3.  $ cp ~/elastic/elasticsearch-8.13.2/config/certs/http_ca.crt .
4.  $ ls
5.  http_ca.crt                     rbac-and-rag-best-friends.ipynb

在上面,我们把 Elasticsearch 的证书拷贝到当前的目录下。上面的 rbac-and-rag-best-friends.ipynb 就是我们下面要展示的 notebook。

展示

在运行 jupyter notebook 之前,我们先在命令行中打入如下的命令来设置变量:

ini 复制代码
1.  export ES_USER="elastic"
2.  export ES_PASSWORD="VDMlz5QnM_0g-349fFq7"
3.  export ES_ENDPOINT="localhost"

我们需要根据自己的配置做相应的改动。然后,我们在当前的 terminal 中打入如下的命令:

jupyter notebook

安装并导入需要的 Python 库

diff 复制代码
!pip install elasticsearch python-dotenv
javascript 复制代码
1.  from elasticsearch import Elasticsearch
2.  from IPython.display import HTML, display
3.  from pprint import pprint
4.  from dotenv import load_dotenv
5.  import os, json

在运行完上面的命令后,我们可以查看安装好的 elasticsearch 包的版本:

markdown 复制代码
1.  $ pip list | grep elasticsearch
2.  elasticsearch                 8.13.0

客户端连接到 Elasticsearch

创建 elasticsearch 连接

ini 复制代码
1.  load_dotenv()

3.  ES_USER = os.getenv("ES_USER")
4.  ES_PASSWORD = os.getenv("ES_PASSWORD")
5.  ES_ENDPOINT = os.getenv("ES_ENDPOINT")

7.  url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
8.  print(url)

10.  es = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
11.  print(es.info())

更多有关如何使用 Python 连接到 Elasticsearch 的知识,请参阅文章 "Elasticsearch:关于在 Python 中使用 Elasticsearch 你需要知道的一切 - 8.x"。

删除演示索引(如果以前存在)

python 复制代码
1.  # Delete indices
2.  def delete_indices():
3.      try:
4.          es.indices.delete(index="rbac_rag_demo-data_public")
5.          print("Deleted index: rbac_rag_demo-data_public")
6.      except Exception as e:
7.          print(f"Error deleting index rbac_rag_demo-data_public: {str(e)}")

9.      try:
10.          es.indices.delete(index="rbac_rag_demo-data_sensitive")
11.          print("Deleted index: rbac_rag_demo-data_sensitive")
12.      except Exception as e:
13.          print(f"Error deleting index rbac_rag_demo-data_sensitive: {str(e)}")

16.  delete_indices()

创建及装载数据到索引中

perl 复制代码
1.  # Create indices
2.  def create_indices():
3.      # Create data_public index
4.      es.indices.create(
5.          index="rbac_rag_demo-data_public",
6.          ignore=400,
7.          body={
8.              "settings": {"number_of_shards": 1},
9.              "mappings": {"properties": {"info": {"type": "text"}}},
10.          },
11.      )

13.      # Create data_sensitive index
14.      es.indices.create(
15.          index="rbac_rag_demo-data_sensitive",
16.          ignore=400,
17.          body={
18.              "settings": {"number_of_shards": 1},
19.              "mappings": {
20.                  "properties": {
21.                      "document": {"type": "text"},
22.                      "confidentiality_level": {"type": "keyword"},
23.                  }
24.              },
25.          },
26.      )

29.  # Populate sample data
30.  def populate_data():
31.      # Public HR information
32.      public_docs = [
33.          {"title": "Annual leave policies updated.", "confidentiality_level": "low"},
34.          {"title": "Remote work guidelines available.", "confidentiality_level": "low"},
35.          {
36.              "title": "Health benefits registration period starts next month.",
37.              "confidentiality_level": "low",
38.          },
39.      ]
40.      for doc in public_docs:
41.          es.index(index="rbac_rag_demo-data_public", document=doc)

43.      # Sensitive HR information
44.      sensitive_docs = [
45.          {
46.              "title": "Executive compensation details Q2 2024.",
47.              "confidentiality_level": "high",
48.          },
49.          {
50.              "title": "Bonus payout structure for all levels.",
51.              "confidentiality_level": "high",
52.          },
53.          {
54.              "title": "Employee stock options plan details.",
55.              "confidentiality_level": "high",
56.          },
57.      ]
58.      for doc in sensitive_docs:
59.          es.index(index="rbac_rag_demo-data_sensitive", document=doc)

62.  create_indices()
63.  populate_data()

我们可以在 Kibana 中使用如下的命令来查看索引:

创建两个具有不同访问级别的用户

ruby 复制代码
1.  # Create roles
2.  def create_roles():
3.      # Role for the engineer
4.      es.security.put_role(
5.          ,
6.          body={
7.              "indices": [
8.                  {"names": ["rbac_rag_demo-data_public"], "privileges": ["read"]}
9.              ]
10.          },
11.      )

13.      # Role for the manager
14.      es.security.put_role(
15.          ,
16.          body={
17.              "indices": [
18.                  {
19.                      "names": [
20.                          "rbac_rag_demo-data_public",
21.                          "rbac_rag_demo-data_sensitive",
22.                      ],
23.                      "privileges": ["read"],
24.                  }
25.              ]
26.          },
27.      )

30.  # Create users with respective roles
31.  def create_users():
32.      # User 'engineer'
33.      es.security.put_user(
34.          user,
35.          body={
36.              "password": "password123",
37.              "roles": ["engineer_role"],
38.              "full_name": "Engineer User",
39.          },
40.      )

42.      # User 'manager'
43.      es.security.put_user(
44.          user,
45.          body={
46.              "password": "password123",
47.              "roles": ["manager_role"],
48.              "full_name": "Manager User",
49.          },
50.      )

53.  create_roles()
54.  create_users()

运行完上面的代码后,我们可以在 Kibana 中进行查看:

我们其实也可以使用 Kibana 的 UI 来创建这些用户及 role。你可以想象阅读文章 "Elasticsearch:用户安全设置"。

测试安全角色如何影响查询数据的能力

创建 helper 函数

用于查询每个用户的辅助函数和一些输出格式

ini 复制代码
1.  """
2.  def get_es_connection(cid, username, password):
3.      return Elasticsearch(cloud_id=cid, basic_auth=(username, password))
4.  """

6.  def get_es_connection(username, password):
7.      url = f"https://{username}:{password}@{ES_ENDPOINT}:9200"
8.      print(url)
9.      return Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

12.  def query_index(es, index_name, username):
13.      try:
14.          response = es.search(index=index_name, body={"query": {"match_all": {}}})

16.          # Prepare the message
17.          results_message = f'Results from querying as <span style="color: orange;">{username}:</span><br>'
18.          for hit in response["hits"]["hits"]:
19.              confidentiality_level = hit["_source"].get("confidentiality_level", "N/A")
20.              index_name = hit.get("_index", "N/A")
21.              title = hit["_source"].get("title", "No title")

23.              # Set color based on confidentiality level
24.              if confidentiality_level == "low":
25.                  conf_color = "lightgreen"
26.              elif confidentiality_level == "high":
27.                  conf_color = "red"
28.              else:
29.                  conf_color = "black"

31.              # Set color based on index name
32.              if index_name == "rbac_rag_demo-data_public":
33.                  index_color = "lightgreen"
34.              elif index_name == "rbac_rag_demo-data_sensitive":
35.                  index_color = "red"
36.              else:
37.                  index_color = "black"  # Default color

39.              results_message += (
40.                  f'Index: <span style="color: {index_color};">{index_name}</span>\t '
41.                  f'confidentiality level: <span style="color: {conf_color};">{confidentiality_level}</span> '
42.                  f'title: <span style="color: lightblue;">{title}</span><br>'
43.              )

45.          display(HTML(results_message))

47.      except Exception as e:
48.          print(f"Error accessing {index_name}: {str(e)}")

模拟 "工程师" 及 "经理" 的查询

swift 复制代码
1.  index_pattern = "rbac_rag_demo-data*"
2.  print(
3.      f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
4.  )

6.  for user in ["engineer", "manager"]:
7.      print(f"Logged in as {user}:")

9.      es_conn = get_es_connection(user, "password123")
10.      results = query_index(es_conn, index_pattern, user)
11.      print("\n\n")
swift 复制代码
1.  index_pattern = "rbac_rag_demo-data*"
2.  print(
3.      f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
4.  )

6.  for user in ["engineer", "manager"]:
7.      print(f"Logged in as {user}:")

9.      es_conn = get_es_connection(user, "password123")
10.      results = query_index(es_conn, index_pattern, user)
11.      print("\n\n")

从上面的输出中,我们可以看出来经理可以同时访问两个索引的数据,但是工程师只能访问属于工程师的数据。

最终的源码在地址 elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends/rbac-and-rag-best-friends.ipynb at main · liu-xiao-guo/elasticsearch-labs · GitHub

相关推荐
hengzhepa3 小时前
ElasticSearch备考 -- Async search
大数据·学习·elasticsearch·搜索引擎·es
bubble小拾11 小时前
ElasticSearch高级功能详解与读写性能调优
大数据·elasticsearch·搜索引擎
不能放弃治疗12 小时前
重生之我们在ES顶端相遇第 18 章 - Script 使用(进阶)
elasticsearch
hengzhepa12 小时前
ElasticSearch备考 -- Search across cluster
学习·elasticsearch·搜索引擎·全文检索·es
Elastic 中国社区官方博客14 小时前
Elasticsearch:使用 LLM 实现传统搜索自动化
大数据·人工智能·elasticsearch·搜索引擎·ai·自动化·全文检索
慕雪华年15 小时前
【WSL】wsl中ubuntu无法通过useradd添加用户
linux·ubuntu·elasticsearch
Elastic 中国社区官方博客17 小时前
使用 Vertex AI Gemini 模型和 Elasticsearch Playground 快速创建 RAG 应用程序
大数据·人工智能·elasticsearch·搜索引擎·全文检索
alfiy18 小时前
Elasticsearch学习笔记(四) Elasticsearch集群安全配置一
笔记·学习·elasticsearch
alfiy19 小时前
Elasticsearch学习笔记(五)Elastic stack安全配置二
笔记·学习·elasticsearch
丶21361 天前
【大数据】Elasticsearch 实战应用总结
大数据·elasticsearch·搜索引擎