Elasticsearch：使用 Open AI 和 Langchain 的 RAG - Retrieval Augmented Generation （三）

这是继之前文章：

的续篇。在今天的文章中，我将详述如何使用 [#!pip3 install langchain](integrations.langchain.com/vectorstore...。这也是被推荐的使用方法。如果你还没有设置好自己的环境，请详细阅读第一篇文章。

创建应用并展示

安装包

\]([integrations.langchain.com/vectorstore...](https://link.juejin.cn?target=https%3A%2F%2Fintegrations.langchain.com%2Fvectorstores%3Fintegration_%253EElasticsearchStore "https://integrations.langchain.com/vectorstores?integration_%3EElasticsearchStore")。这也是被推荐的使用方法。如果你还没有设置好自己的环境，请详细阅读第一篇文章。 ### [](https://link.juejin.cn?target= "")创建应用并展示 ### [](https://link.juejin.cn?target= "")安装包 ### \[\]([integrations.langchain.com/vectorstore...](https://link.juejin.cn?target=https%3A%2F%2Fintegrations.langchain.com%2Fvectorstores%3Fintegration_%253EElasticsearchStore "https://integrations.langchain.com/vectorstores?integration_%3EElasticsearchStore")。这也是被推荐的使用方法。如果你还没有设置好自己的环境，请详细阅读第一篇文章。 ### [](https://link.juejin.cn?target= "")创建应用并展示 ### [](https://link.juejin.cn?target= "")安装包 ```javascript 1. from dotenv import load_dotenv 2. from langchain.embeddings import OpenAIEmbeddings 3. from langchain.vectorstores import ElasticsearchStore 4. from langchain.text_splitter import CharacterTextSplitter 5. from urllib.request import urlopen 6. import os, json 8. load_dotenv() 10. openai_api_key=os.getenv('OPENAI_API_KEY') 11. elastic_user=os.getenv('ES_USER') 12. elastic_password=os.getenv('ES_PASSWORD') 13. elastic_endpoint=os.getenv("ES_ENDPOINT") 14. elastic_index_name='elasticsearch-store' ``` ### 添加文档并将文档分成段落 ```python 1. with open('workplace-docs.json') as f: 2. workplace_docs = json.load(f) 4. print(f"Successfully loaded {len(workplace_docs)} documents") ``` ![](https://file.jishuzhan.net/article/1717155780693594114/e73f07598ea663552c6c132c4c9d1787.webp) ```css 1. metadata = [] 2. content = [] 4. for doc in workplace_docs: 5. content.append(doc["content"]) 6. metadata.append({ 7. "name": doc["name"], 8. "summary": doc["summary"], 9. "rolePermissions":doc["rolePermissions"] 10. }) 12. text_splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=0) 13. docs = text_splitter.create_documents(content, metadatas=metadata) ``` ![](https://file.jishuzhan.net/article/1717155780693594114/ab446890a70cbc3f74bb07c030894b93.webp) ### 把数据写入到 Elasticsearch ```ini 1. from elasticsearch import Elasticsearch 3. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) 5. url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200" 6. connection = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True) 9. es = ElasticsearchStore.from_documents( 10. docs, 11. embedding = embeddings, 12. es_url = url, 13. es_connection = connection, 14. index_name = elastic_index_name, 15. es_user = elastic_user, 16. es_password = elastic_password) ``` ![](https://file.jishuzhan.net/article/1717155780693594114/cc736f73d97e0a198b2cdfa6747e19f0.webp) ### 展示结果 ```lua 1. def showResults(output): 2. print("Total results: ", len(output)) 3. for index in range(len(output)): 4. print(output[index]) ``` ### Similarity / Vector Search (Approximate KNN Search) - ApproxRetrievalStrategy() ```ini 1. query = "work from home policy" 2. result = es.similarity_search(query=query) 4. showResults(result) ``` ![](https://file.jishuzhan.net/article/1717155780693594114/1c0ab503f373993e6ef6cfa121a74003.webp) ### Hybrid Search (Approximate KNN + Keyword Search) - ApproxRetrievalStrategy() 我们在 Kibana 的 Dev Tools 里打入如下的命令： ```ini 1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) 4. es = ElasticsearchStore( 5. es_url = url, 6. es_connection = connection, 7. es_user=elastic_user, 8. es_password=elastic_password, 9. embedding=embeddings, 10. index_name=elastic_index_name, 11. strategy=ElasticsearchStore.ApproxRetrievalStrategy( 12. hybrid=True 13. ) 14. ) 16. es.similarity_search("work from home policy") ``` ![](https://file.jishuzhan.net/article/1717155780693594114/e173b70a5839f9a294513b609ef9da2c.webp) 造成这个错误的原因是因为当前的 License 模式不支持 [RRF](https://link.juejin.cn?target=https%3A%2F%2Felasticstack.blog.csdn.net%2Farticle%2Fdetails%2F131200354 "https://elasticstack.blog.csdn.net/article/details/131200354")。我们去 Kibana 启动当前的授权： ![](https://file.jishuzhan.net/article/1717155780693594114/bc3b25a46d23153f9a00c9278d296200.webp) 我们再次运行代码： ![](https://file.jishuzhan.net/article/1717155780693594114/339b567e8b9b16a4eb65d94357c62889.webp) ### Exact KNN Search (Brute Force) - ExactRetrievalStrategy() ```ini 1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) 4. es = ElasticsearchStore( 5. es_url = url, 6. es_connection = connection, 7. es_user=elastic_user, 8. es_password=elastic_password, 9. embedding=embeddings, 10. index_name=elastic_index_name, 11. strategy=ElasticsearchStore.ExactRetrievalStrategy() 12. ) 14. es.similarity_search("work from home policy") ``` ![](https://file.jishuzhan.net/article/1717155780693594114/b7e8463f3fa2dbdade63b85ec1bd0905.webp) ### Index / Search Documents using ELSER - SparseVectorRetrievalStrategy() 在这个步骤中，我们需要启动 ELSER。有关 ELSER 的启动，请参阅文章 "[Elasticsearch：部署 ELSER - Elastic Learned Sparse EncoderR](https://link.juejin.cn?target=https%3A%2F%2Felasticstack.blog.csdn.net%2Farticle%2Fdetails%2F131180664 "https://elasticstack.blog.csdn.net/article/details/131180664")"。 ![](https://file.jishuzhan.net/article/1717155780693594114/c4596e21bc5235162200dc19f9076bae.webp) ![](https://file.jishuzhan.net/article/1717155780693594114/2388ff9aa5efc8e3f8248f8f9f30b240.webp) ![](https://file.jishuzhan.net/article/1717155780693594114/d22326ca7e519047e75d53b44516a5b0.webp) ```ini 1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) 3. es = ElasticsearchStore.from_documents( 4. docs, 5. es_url = url, 6. es_connection = connection, 7. es_user=elastic_user, 8. es_password=elastic_password, 9. index_, 10. strategy=ElasticsearchStore.SparseVectorRetrievalStrategy() 11. ) 13. es.similarity_search("work from home policy") ``` ![](https://file.jishuzhan.net/article/1717155780693594114/bce1a462f48181573f9d10619bb1066f.webp) 上面代码的整个 jupyter notebook 可以在地址 [github.com/liu-xiao-gu...](https://link.juejin.cn?target=https%3A%2F%2Fgithub.com%2Fliu-xiao-guo%2Fsemantic_search_es%2Fblob%2Fmain%2FElasticsearchStore.ipynb "https://github.com/liu-xiao-guo/semantic_search_es/blob/main/ElasticsearchStore.ipynb") 下载。