
这是继之前文章:
-
Elasticsearch:使用 Open AI 和 Langchain 的 RAG - Retrieval Augmented Generation (一)
-
Elasticsearch:使用 Open AI 和 Langchain 的 RAG - Retrieval Augmented Generation (二)
的续篇。在今天的文章中,我将详述如何使用 [#!pip3 install langchain
](integrations.langchain.com/vectorstore...。这也是被推荐的使用方法。如果你还没有设置好自己的环境,请详细阅读第一篇文章。
创建应用并展示
安装包
\]([integrations.langchain.com/vectorstore...](https://link.juejin.cn?target=https%3A%2F%2Fintegrations.langchain.com%2Fvectorstores%3Fintegration_%253EElasticsearchStore "https://integrations.langchain.com/vectorstores?integration_%3EElasticsearchStore")。这也是被推荐的使用方法。如果你还没有设置好自己的环境,请详细阅读第一篇文章。 ### [](https://link.juejin.cn?target= "")创建应用并展示 ### [](https://link.juejin.cn?target= "")安装包 ### \[\]([integrations.langchain.com/vectorstore...](https://link.juejin.cn?target=https%3A%2F%2Fintegrations.langchain.com%2Fvectorstores%3Fintegration_%253EElasticsearchStore "https://integrations.langchain.com/vectorstores?integration_%3EElasticsearchStore")。这也是被推荐的使用方法。如果你还没有设置好自己的环境,请详细阅读第一篇文章。 ### [](https://link.juejin.cn?target= "")创建应用并展示 ### [](https://link.juejin.cn?target= "")安装包 ```javascript 1. from dotenv import load_dotenv 2. from langchain.embeddings import OpenAIEmbeddings 3. from langchain.vectorstores import ElasticsearchStore 4. from langchain.text_splitter import CharacterTextSplitter 5. from urllib.request import urlopen 6. import os, json 8. load_dotenv() 10. openai_api_key=os.getenv('OPENAI_API_KEY') 11. elastic_user=os.getenv('ES_USER') 12. elastic_password=os.getenv('ES_PASSWORD') 13. elastic_endpoint=os.getenv("ES_ENDPOINT") 14. elastic_index_name='elasticsearch-store' ``` ### 添加文档并将文档分成段落 ```python 1. with open('workplace-docs.json') as f: 2. workplace_docs = json.load(f) 4. print(f"Successfully loaded {len(workplace_docs)} documents") ```  ```css 1. metadata = [] 2. content = [] 4. for doc in workplace_docs: 5. content.append(doc["content"]) 6. metadata.append({ 7. "name": doc["name"], 8. "summary": doc["summary"], 9. "rolePermissions":doc["rolePermissions"] 10. }) 12. text_splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=0) 13. docs = text_splitter.create_documents(content, metadatas=metadata) ```  ### 把数据写入到 Elasticsearch ```ini 1. from elasticsearch import Elasticsearch 3. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) 5. url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200" 6. connection = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True) 9. es = ElasticsearchStore.from_documents( 10. docs, 11. embedding = embeddings, 12. es_url = url, 13. es_connection = connection, 14. index_name = elastic_index_name, 15. es_user = elastic_user, 16. es_password = elastic_password) ```  ### 展示结果 ```lua 1. def showResults(output): 2. print("Total results: ", len(output)) 3. for index in range(len(output)): 4. print(output[index]) ``` ### Similarity / Vector Search (Approximate KNN Search) - ApproxRetrievalStrategy() ```ini 1. query = "work from home policy" 2. result = es.similarity_search(query=query) 4. showResults(result) ```  ### Hybrid Search (Approximate KNN + Keyword Search) - ApproxRetrievalStrategy() 我们在 Kibana 的 Dev Tools 里打入如下的命令: ```ini 1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) 4. es = ElasticsearchStore( 5. es_url = url, 6. es_connection = connection, 7. es_user=elastic_user, 8. es_password=elastic_password, 9. embedding=embeddings, 10. index_name=elastic_index_name, 11. strategy=ElasticsearchStore.ApproxRetrievalStrategy( 12. hybrid=True 13. ) 14. ) 16. es.similarity_search("work from home policy") ```  造成这个错误的原因是因为当前的 License 模式不支持 [RRF](https://link.juejin.cn?target=https%3A%2F%2Felasticstack.blog.csdn.net%2Farticle%2Fdetails%2F131200354 "https://elasticstack.blog.csdn.net/article/details/131200354")。我们去 Kibana 启动当前的授权:  我们再次运行代码:  ### Exact KNN Search (Brute Force) - ExactRetrievalStrategy() ```ini 1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) 4. es = ElasticsearchStore( 5. es_url = url, 6. es_connection = connection, 7. es_user=elastic_user, 8. es_password=elastic_password, 9. embedding=embeddings, 10. index_name=elastic_index_name, 11. strategy=ElasticsearchStore.ExactRetrievalStrategy() 12. ) 14. es.similarity_search("work from home policy") ```  ### Index / Search Documents using ELSER - SparseVectorRetrievalStrategy() 在这个步骤中,我们需要启动 ELSER。有关 ELSER 的启动,请参阅文章 "[Elasticsearch:部署 ELSER - Elastic Learned Sparse EncoderR](https://link.juejin.cn?target=https%3A%2F%2Felasticstack.blog.csdn.net%2Farticle%2Fdetails%2F131180664 "https://elasticstack.blog.csdn.net/article/details/131180664")"。    ```ini 1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) 3. es = ElasticsearchStore.from_documents( 4. docs, 5. es_url = url, 6. es_connection = connection, 7. es_user=elastic_user, 8. es_password=elastic_password, 9. index_, 10. strategy=ElasticsearchStore.SparseVectorRetrievalStrategy() 11. ) 13. es.similarity_search("work from home policy") ```  上面代码的整个 jupyter notebook 可以在地址 [github.com/liu-xiao-gu...](https://link.juejin.cn?target=https%3A%2F%2Fgithub.com%2Fliu-xiao-guo%2Fsemantic_search_es%2Fblob%2Fmain%2FElasticsearchStore.ipynb "https://github.com/liu-xiao-guo/semantic_search_es/blob/main/ElasticsearchStore.ipynb") 下载。