Elasticsearch：将 Ollama 与推理 API 结合使用

作者：来自 Elastic Jeffrey Rengifo

Ollama API 与 OpenAI API 兼容，因此将 Ollama 与 Elasticsearch 集成非常容易。

在本文中，我们将学习如何使用 Ollama 将本地模型连接到 Elasticsearch 推理模型，然后使用 Playground 向文档提出问题。

Elasticsearch 允许用户使用开放推理 API（Inference API）连接到 LLMs，支持 Amazon Bedrock、Cohere、Google AI、Azure AI Studio、HuggingFace 等提供商（作为服务）等。

Ollama 是一个工具，允许你使用自己的基础设施（本地机器/服务器）下载和执行 LLM 模型。你可以在此处找到与 Ollama 兼容的可用型号列表。

如果你想要托管和测试不同的开源模型，而又不必担心每个模型需要以不同的方式设置，或者如何创建 API 来访问模型功能，那么 Ollama 是一个不错的选择，因为 Ollama 会处理所有事情。

由于 Ollama API 与 OpenAI API 兼容，我们可以轻松集成推理模型并使用 Playground 创建 RAG 应用程序。

先决条件

Elasticsearch 8.17
Kibana 8.17
Python

步骤

设置 Ollama LLM 服务器
创建映射
索引数据
使用 Playground 提问

设置 Ollama LLM 服务器

我们将设置一个 LLM 服务器，并使用 Ollama 将其连接到我们的 Playground 实例。我们需要：

下载并运行 Ollama。
使用 ngrok 通过互联网访问托管 Ollama 的本地 Web 服务器

下载并运行 Ollama

要使用Ollama，我们首先需要下载它。 Ollama 支持 Linux、Windows 和 macOS，因此只需在此处下载与你的操作系统兼容的 Ollama 版本即可。一旦安装了 Ollama，我们就可以从这个受支持的 LLM 列表中选择一个模型。在此示例中，我们将使用 llama3.2 模型，这是一个通用的多语言模型。在安装过程中，你将启用 Ollama 的命令行工具。下载完成后，你可以运行以下行：

复制代码

ollama pull llama3.2

这将输出：

复制代码

pulling manifest
pulling dde5aa3fc5ff... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 2.0 GB
pulling 966de95ca8a6... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 1.4 KB
pulling fcc5a6bec9da... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB
pulling a70ff7e570d9... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏ 6.0 KB
pulling 56bb8bd477a5... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏   96 B
pulling 34bb5ab01051... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████▏  561 B
verifying sha256 digest
writing manifest
success

安装后，你可以使用以下命令进行测试：

复制代码

ollama run llama3.2

我们来问一个问题：

在模型运行时，Ollama 启用默认在端口 "11434" 上运行的 API。让我们按照官方文档向该 API 发出请求：

复制代码

curl http://localhost:11434/api/generate -d '{                                          
  "model": "llama3.2",               
  "prompt": "What is the capital of France?"
}'

这是我们得到的答案：

复制代码

{"model":"llama3.2","created_at":"2024-11-28T21:48:42.152817532Z","response":"The","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.251884485Z","response":" capital","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.347365913Z","response":" of","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.446837322Z","response":" France","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.542367394Z","response":" is","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.644580384Z","response":" Paris","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.739865362Z","response":".","done":false}
{"model":"llama3.2","created_at":"2024-11-28T21:48:42.834347518Z","response":"","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,128009,128006,882,128007,271,3923,374,279,6864,315,9822,30,128009,128006,78191,128007,271,791,6864,315,9822,374,12366,13],"total_duration":6948567145,"load_duration":4386106503,"prompt_eval_count":32,"prompt_eval_duration":1872000000,"eval_count":8,"eval_duration":684000000}

请注意，此端点的具体响应是流式传输。

使用 ngrok 将端点暴露给互联网

由于我们的端点在本地环境中工作，因此无法通过互联网从另一个点（如我们的 Elastic Cloud 实例）访问它。 ngrok 允许我们公开提供公共 IP 的端口。在 ngrok 中创建一个帐户并按照官方设置指南进行操作。

注：这个有点类似在中国提供的 "花生壳" 功能。

一旦安装并配置了 ngrok 代理，我们就可以使用以下命令公开 Ollama 端口：

复制代码

ngrok http 11434 --host-header="localhost:11434"

注意：标头 --host-header="localhost:11434" 保证请求中的 "Host" 标头与 "localhost:11434" 匹配

执行此命令将返回一个公共链接，只要 ngrok 和 Ollama 服务器在本地运行，该链接就会起作用。

复制代码

Session Status                online                                                                                                                                                                              
Account                       xxxx@yourEmailProvider.com (Plan: Free)                                                                                                                                             
Version                       3.18.4                                                                                                                                                                              
Region                        United States (us)                                                                                                                                                                  
Latency                       561ms                                                                                                                                                                               
Web Interface                 http://127.0.0.1:4040                                                                                                                                                               
Forwarding                    https://your-ngrok-url.ngrok-free.app -> http://localhost:11434                                                                                                                   


Connections                   ttl     opn     rt1     rt5     p50     p90                                                                                                                                         
                              0       0       0.00    0.00    0.00    0.00                                                ```

在 "Forwarding" 中我们可以看到 ngrok 生成了一个 URL。保存以供以后使用。

让我们再次尝试向端点发出 HTTP 请求，现在使用 ngrok 生成的 URL：

复制代码

curl https://your-ngrok-endpoint.ngrok-free.app/api/generate -d '{                                          
  "model": "llama3.2",               
  "prompt": "What is the capital of France?"
}'

响应应与前一个类似。

创建映射

ELSER 端点

对于此示例，我们将使用 Elasticsearch 推理 API 创建一个推理端点。此外，我们将使用 ELSER 来生成嵌入。

复制代码

PUT _inference/sparse_embedding/medicines-inference
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": ".elser_model_2_linux-x86_64"
  }
}

在这个例子中，假设你有一家药店，销售两种类型的药品：

需要处方的药物。
不需要处方的药物。

该信息将包含在每种药物的描述字段中。

LLM 必须解释这个字段，因此我们将使用以下数据映射：

复制代码

PUT medicines
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "copy_to": "semantic_field"
      },
      "semantic_field": {
        "type": "semantic_text",
        "inference_id": "medicines-inference"
      },
      "text_description": {
        "type": "text",
        "copy_to": "semantic_field"
      }
    }
  }
}

字段 text_description 将存储描述的纯文本，而 semantic_field（一种 semantic_text 字段类型）将存储由 ELSER 生成的嵌入。

copy_to 属性将把字段 name 和 text_description 中的内容复制到语义字段中，以便生成这些字段的嵌入。

索引数据

现在，让我们使用 _bulk API 对数据进行索引。

复制代码

POST _bulk
{"index":{"_index":"medicines"}}
{"id":1,"name":"Paracetamol","text_description":"An analgesic and antipyretic that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":2,"name":"Ibuprofen","text_description":"A nonsteroidal anti-inflammatory drug (NSAID) available WITHOUT a prescription."}
{"index":{"_index":"medicines"}}
{"id":3,"name":"Amoxicillin","text_description":"An antibiotic that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":4,"name":"Lorazepam","text_description":"An anxiolytic medication that strictly requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":5,"name":"Omeprazole","text_description":"A medication for stomach acidity that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":6,"name":"Insulin","text_description":"A hormone used in diabetes treatment that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":7,"name":"Cold Medicine","text_description":"A compound formula to relieve flu symptoms available WITHOUT a prescription."}
{"index":{"_index":"medicines"}}
{"id":8,"name":"Clonazepam","text_description":"An antiepileptic medication that requires a prescription."}
{"index":{"_index":"medicines"}}
{"id":9,"name":"Vitamin C","text_description":"A dietary supplement that does NOT require a prescription."}
{"index":{"_index":"medicines"}}
{"id":10,"name":"Metformin","text_description":"A medication used for type 2 diabetes that requires a prescription."}

响应：

复制代码

{
   "errors": false,
   "took": 34732020848,
   "items": [
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "mYoeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 0,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "mooeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 1,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "m4oeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 2,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "nIoeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 3,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "nYoeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 4,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "nooeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 5,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "n4oeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 6,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "oIoeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 7,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "oYoeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 8,
     	"_primary_term": 1,
     	"status": 201
   	}
 	},
 	{
   	"index": {
     	"_index": "medicines",
     	"_id": "oooeMpQBF7lnCNFTfdn2",
     	"_version": 1,
     	"result": "created",
     	"_shards": {
       	"total": 2,
       	"successful": 2,
       	"failed": 0
     	},
     	"_seq_no": 9,
     	"_primary_term": 1,
     	"status": 201
   	}
 	}
   ]
 }

使用 Playground 提问

Playground 是一个 Kibana 工具，允许你使用 Elasticsearch 索引和 LLM 提供程序快速创建 RAG 系统。你可以阅读本文以了解更多信息。

将本地 LLM 连接到 Playground

我们首先需要创建一个使用我们刚刚创建的公共 URL 的连接器。在 Kibana 中，转到 Search>Playground，然后单击 "Connect to an LLM"。

此操作将显示 Kibana 界面左侧的菜单。在那里，点击 "OpenAI"。

我们现在可以开始配置 OpenAI 连接器。

转到 "Connector settings"，对于 OpenAI 提供商，选择 "Other (OpenAI Compatible Service)"：

现在，让我们配置其他字段。在这个例子中，我们将我们的模型命名为 "medicines-llm"。在 URL 字段中，使用 ngrok 生成的 URL（/v1/chat/completions）。在 "Default model" 字段中，选择 "llama3.2"。我们不会使用 API 密钥，因此只需输入任何随机文本即可继续：

点击 "Save"，点击 "Add data sources" 添加索引药品：

太棒了！我们现在可以使用在本地运行的 LLM 作为 RAG 引擎来访问 Playground。

在测试之前，让我们向代理添加更具体的指令，并将发送给模型的文档数量增加到 10，以便答案具有尽可能多的可用文档。上下文字段将是 semantic_field，它包括药物的名称和描述，这要归功于 copy_to 属性。

现在让我们问一个问题：Can I buy Clonazepam without a prescription? 看看会发生什么：

https://drive.google.com/file/d/1WOg9yJ2Vs5ugmXk9_K9giZJypB8jbxuN/view?usp=drive_link

正如我们所料，我们得到了正确的答案。

后续步骤

下一步是创建你自己的应用程序！ Playground 提供了一个 Python 代码脚本，你可以在自己的机器上运行它并自定义它以满足你的需要。例如，通过将其置于 FastAPI 服务器后面来创建由你的 UI 使用的 QA 药品聊天机器人。

你可以通过点击 Playground 右上角的 View code按钮找到此代码：

并且你使用 Endpoints & API keys生成代码中所需的 ES_API_KEY 环境变量。

对于此特定示例，代码如下：

复制代码

## Install the required packages
## pip install -qU elasticsearch openai
import os
from elasticsearch import Elasticsearch
from openai import OpenAI
es_client = Elasticsearch(
    "https://your-deployment.us-central1.gcp.cloud.es.io:443",
    api_key=os.environ["ES_API_KEY"]
)
openai_client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
)
index_source_fields = {
    "medicines": [
        "semantic_field"
    ]
}
def get_elasticsearch_results():
    es_query = {
        "retriever": {
            "standard": {
                "query": {
                    "nested": {
                        "path": "semantic_field.inference.chunks",
                        "query": {
                            "sparse_vector": {
                                "inference_id": "medicines-inference",
                                "field": "semantic_field.inference.chunks.embeddings",
                                "query": query
                            }
                        },
                        "inner_hits": {
                            "size": 2,
                            "name": "medicines.semantic_field",
                            "_source": [
                                "semantic_field.inference.chunks.text"
                            ]
                        }
                    }
                }
            }
        },
        "size": 3
    }
    result = es_client.search(index="medicines", body=es_query)
    return result["hits"]["hits"]
def create_openai_prompt(results):
    context = ""
    for hit in results:
        inner_hit_path = f"{hit['_index']}.{index_source_fields.get(hit['_index'])[0]}"
        ## For semantic_text matches, we need to extract the text from the inner_hits
        if 'inner_hits' in hit and inner_hit_path in hit['inner_hits']:
            context += '\n --- \n'.join(inner_hit['_source']['text'] for inner_hit in hit['inner_hits'][inner_hit_path]['hits']['hits'])
        else:
            source_field = index_source_fields.get(hit["_index"])[0]
            hit_context = hit["_source"][source_field]
            context += f"{hit_context}\n"
    prompt = f"""
  Instructions:
  - You are an assistant specializing in answering questions about the sale of medicines.
  - Answer questions truthfully and factually using only the context presented.
  - If you don't know the answer, just say that you don't know, don't make up an answer.
  - You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
  - Use markdown format for code examples.
  - You are correct, factual, precise, and reliable.
  Context:
  {context}
  """
    return prompt
def generate_openai_completion(user_prompt, question):
    response = openai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": user_prompt},
            {"role": "user", "content": question},
        ]
    )
    return response.choices[0].message.content
if __name__ == "__main__":
    question = "my question"
    elasticsearch_results = get_elasticsearch_results()
    context_prompt = create_openai_prompt(elasticsearch_results)
    openai_completion = generate_openai_completion(context_prompt, question)
    print(openai_completion)

为了使其与 Ollama 一起工作，你必须更改 OpenAI 客户端以连接到 Ollama 服务器而不是 OpenAI 服务器。你可以在此处找到 OpenAI 示例和兼容端点的完整列表。

复制代码

openai_client = OpenAI(
    # you can use http://localhost:11434/v1/ if running this code locally.
    base_url='https://your-ngrok-url.ngrok-free.app/v1/',
    # required but ignored
    api_key='ollama',
)

并且在调用完成方法时将模型更改为 llama3.2：

复制代码

def generate_openai_completion(user_prompt, question):
    response = openai_client.chat.completions.create(
        model="llama3.2",
        messages=[
            {"role": "system", "content": user_prompt},
            {"role": "user", "content": question},
        ]
    )
    return response.choices[0].message.content

让我们添加一个问题：***an I buy Clonazepam without a prescription?***对于 Elasticsearch 查询：

复制代码

def get_elasticsearch_results():
    es_query = {
        "retriever": {
            "standard": {
                "query": {
                    "nested": {
                        "path": "semantic_field.inference.chunks",
                        "query": {
                            "sparse_vector": {
                                "inference_id": "medicines-inference",
                                "field": "semantic_field.inference.chunks.embeddings",
                                "query": "Can I buy Clonazepam without a prescription?"
                            }
                        },
                        "inner_hits": {
                            "size": 2,
                            "name": "medicines.semantic_field",
                            "_source": [
                                "semantic_field.inference.chunks.text"
                            ]
                        }
                    }
                }
            }
        },
        "size": 3
    }
    result = es_client.search(index="medicines", body=es_query)
    return result["hits"]["hits"]

另外，在完成调用时还会打印一些内容，这样我们就可以确认我们正在将 Elasticsearch 结果作为问题上下文的一部分发送：

复制代码

if __name__ == "__main__":
    question = "Can I buy Clonazepam without a prescription?"
    elasticsearch_results = get_elasticsearch_results()
    context_prompt = create_openai_prompt(elasticsearch_results)
    print("========== Context Prompt START ==========")
    print(context_prompt)
    print("========== Context Prompt END ==========")
    print("========== Ollama Completion START ==========")
    openai_completion = generate_openai_completion(context_prompt, question)
    print(openai_completion)
    print("========== Ollama Completion END ==========")

现在让我们运行命令：

复制代码

pip install -qU elasticsearch openai

python main.py

你应该看到类似这样的内容：

复制代码

========== Context Prompt START ==========
  Instructions:
  - You are an assistant specializing in answering questions about the sale of medicines.
  - Answer questions truthfully and factually using only the context presented.
  - If you don't know the answer, just say that you don't know, don't make up an answer.
  - You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
  - Use markdown format for code examples.
  - You are correct, factual, precise, and reliable.
  Context:
  Clonazepam
 ---
An antiepileptic medication that requires a prescription.A nonsteroidal anti-inflammatory drug (NSAID) available WITHOUT a prescription.
 ---
IbuprofenAn anxiolytic medication that strictly requires a prescription.
 ---
Lorazepam


========== Context Prompt END ==========
========== Ollama Completion START ==========
No, you cannot buy Clonazepam over-the-counter (OTC) without a prescription [1]. It is classified as a controlled substance in the United States due to its potential for dependence and abuse. Therefore, it can only be obtained from a licensed healthcare provider who will issue a prescription for this medication.
========== Ollama Completion END ==========

结论

在本文中，我们可以看到，当将 Ollama 等工具与 Elasticsearch 推理 API 和 Playground 结合使用时，它们的强大功能和多功能性。

经过几个简单的步骤，我们就得到了一个可以运行的 RAG 应用程序，该应用程序可以使用 LLM 在我们自己的基础设施中免费运行的聊天功能。这还使我们能够更好地控制资源和敏感信息，同时还使我们能够访问用于不同任务的各种模型。

想要获得 Elastic 认证吗？了解下一期 Elasticsearch 工程师培训何时举行！

Elasticsearch 包含许多新功能，可帮助你为你的用例构建最佳的搜索解决方案。深入了解我们的示例笔记本以了解更多信息，开始免费云试用，或立即在本地机器上试用 Elastic。

原文：Using Ollama with the Inference API - Elasticsearch Labs