这是继之前文章:
-
Elasticsearch:使用 Open AI 和 Langchain 的 RAG - Retrieval Augmented Generation (一)
-
Elasticsearch:使用 Open AI 和 Langchain 的 RAG - Retrieval Augmented Generation (二)
的续篇。在今天的文章中,我将详述如何使用 [#!pip3 install langchain
](integrations.langchain.com/vectorstore...。这也是被推荐的使用方法。如果你还没有设置好自己的环境,请详细阅读第一篇文章。
创建应用并展示
安装包
[](integrations.langchain.com/vectorstore...。这也是被推荐的使用方法。如果你还没有设置好自己的环境,请详细阅读第一篇文章。
创建应用并展示
安装包
[](integrations.langchain.com/vectorstore...。这也是被推荐的使用方法。如果你还没有设置好自己的环境,请详细阅读第一篇文章。
创建应用并展示
安装包
javascript
1. from dotenv import load_dotenv
2. from langchain.embeddings import OpenAIEmbeddings
3. from langchain.vectorstores import ElasticsearchStore
4. from langchain.text_splitter import CharacterTextSplitter
5. from urllib.request import urlopen
6. import os, json
8. load_dotenv()
10. openai_api_key=os.getenv('OPENAI_API_KEY')
11. elastic_user=os.getenv('ES_USER')
12. elastic_password=os.getenv('ES_PASSWORD')
13. elastic_endpoint=os.getenv("ES_ENDPOINT")
14. elastic_index_name='elasticsearch-store'
添加文档并将文档分成段落
python
1. with open('workplace-docs.json') as f:
2. workplace_docs = json.load(f)
4. print(f"Successfully loaded {len(workplace_docs)} documents")
css
1. metadata = []
2. content = []
4. for doc in workplace_docs:
5. content.append(doc["content"])
6. metadata.append({
7. "name": doc["name"],
8. "summary": doc["summary"],
9. "rolePermissions":doc["rolePermissions"]
10. })
12. text_splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=0)
13. docs = text_splitter.create_documents(content, metadatas=metadata)
把数据写入到 Elasticsearch
ini
1. from elasticsearch import Elasticsearch
3. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
5. url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
6. connection = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
9. es = ElasticsearchStore.from_documents(
10. docs,
11. embedding = embeddings,
12. es_url = url,
13. es_connection = connection,
14. index_name = elastic_index_name,
15. es_user = elastic_user,
16. es_password = elastic_password)
展示结果
lua
1. def showResults(output):
2. print("Total results: ", len(output))
3. for index in range(len(output)):
4. print(output[index])
Similarity / Vector Search (Approximate KNN Search) - ApproxRetrievalStrategy()
ini
1. query = "work from home policy"
2. result = es.similarity_search(query=query)
4. showResults(result)
Hybrid Search (Approximate KNN + Keyword Search) - ApproxRetrievalStrategy()
我们在 Kibana 的 Dev Tools 里打入如下的命令:
ini
1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
4. es = ElasticsearchStore(
5. es_url = url,
6. es_connection = connection,
7. es_user=elastic_user,
8. es_password=elastic_password,
9. embedding=embeddings,
10. index_name=elastic_index_name,
11. strategy=ElasticsearchStore.ApproxRetrievalStrategy(
12. hybrid=True
13. )
14. )
16. es.similarity_search("work from home policy")
造成这个错误的原因是因为当前的 License 模式不支持 RRF。我们去 Kibana 启动当前的授权:
我们再次运行代码:
Exact KNN Search (Brute Force) - ExactRetrievalStrategy()
ini
1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
4. es = ElasticsearchStore(
5. es_url = url,
6. es_connection = connection,
7. es_user=elastic_user,
8. es_password=elastic_password,
9. embedding=embeddings,
10. index_name=elastic_index_name,
11. strategy=ElasticsearchStore.ExactRetrievalStrategy()
12. )
14. es.similarity_search("work from home policy")
Index / Search Documents using ELSER - SparseVectorRetrievalStrategy()
在这个步骤中,我们需要启动 ELSER。有关 ELSER 的启动,请参阅文章 "Elasticsearch:部署 ELSER - Elastic Learned Sparse EncoderR"。
ini
1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
3. es = ElasticsearchStore.from_documents(
4. docs,
5. es_url = url,
6. es_connection = connection,
7. es_user=elastic_user,
8. es_password=elastic_password,
9. index_,
10. strategy=ElasticsearchStore.SparseVectorRetrievalStrategy()
11. )
13. es.similarity_search("work from home policy")
上面代码的整个 jupyter notebook 可以在地址 github.com/liu-xiao-gu... 下载。