使用 Remote Write 将 Prometheus 指标发送到 Elasticsearch

Elasticsearch 原生支持 Prometheus Remote Write。只需在你的 Prometheus 配置中添加一个 remote_write 配置块，并使用 Elasticsearch 作为兼容 Prometheus 的长期存储。

Prometheus 有一个用于将指标发送到外部存储的标准协议：Remote Write。Elasticsearch 现在原生实现了这一协议，因此你可以将其作为 remote_write 目标，仅需一个配置块即可接入。

这让你可以把 Prometheus 指标引入同一个集群，该集群也可以存储日志、追踪以及其他数据。一个存储后端、一套访问控制、一个查询位置。

为什么要将 Prometheus 指标存入 Elasticsearch？

Prometheus 的本地存储面向短期保留，通常为 15 到 30 天。对于更长时间的数据，就需要一个远程存储后端。

Elasticsearch 的时间序列数据流（TSDS）专为高效长期指标存储而设计：自动滚动、基于时间的分区、通过索引排序实现压缩，以及随着数据老化进行降采样以降低存储成本。你的 Prometheus 抓取配置无需改变。

近期的 Elasticsearch 版本显著降低了指标数据的存储占用。关于具体数值的专门文章即将发布。

在查询层面，ES|QL 吸收了 PromQL：内置的 PROMQL 函数可以让你现有查询无需修改即可运行，而当你需要跨数据集进行 join、聚合或转换时，可以使用 ES|QL 的完整能力。

由于指标与日志、追踪和性能分析数据存储在同一系统中，不同信号之间的关联分析可以通过一条查询完成，而不再需要跨系统排查。

工作原理（How it works）

想要更详细了解当 Remote Write 请求进入 Elasticsearch 内部时发生了什么（protobuf 解析、指标类型推断、TSDS 映射以及数据流路由），可以参考《Prometheus Remote Write 在 Elasticsearch 中的摄取原理》。

Prometheus 通过标准 Remote Write 协议（v1）向 Elasticsearch 发送指标。该端点接受 protobuf 编码、snappy 压缩的 WriteRequest 负载。

每个 sample 都会作为一条 Elasticsearch 文档写入预定义的时间序列数据流（time series data stream）。Prometheus 的 labels 会被转换为 TSDS 的维度（dimensions）。指标值则存储在 metrics.<metric_name> 下的类型化字段中。

Elasticsearch 会根据命名约定推断指标类型（counter 或 gauge）。以 _total 、_sum 、_count 或 _bucket 结尾的指标会被视为 counter，其余则被视为 gauge。

设置方法（Setting it up）

步骤 1：获取 Elasticsearch 端点

你需要一个已启用 Prometheus 端点的 Elasticsearch 集群。最简单的方式是使用 Elastic Cloud Serverless，它开箱即用。

对于 Serverless：登录 cloud.elastic.co，创建一个 Observability 项目，然后在项目设置页面复制 Elasticsearch 端点。该端点类似于：https://<project-id>.es.<region>.<provider>.elastic.cloud

步骤 2：创建 API key

创建一个仅限写入 metrics 数据流范围的 API key。在你的 Elastic Cloud Serverless 项目中，进入 Admin and settings（左侧导航栏底部的齿轮图标），然后选择 API keys。

在 Control security privileges（控制安全权限）部分使用以下 role descriptor：

复制代码

{
  "ingest": {
    "indices": [
      {
        "names": ["metrics-*"],
        "privileges": ["auto_configure", "create_doc"]
      }
    ]
  }
}

在关闭对话框之前请复制该 key 的值。一旦关闭，你将无法再次获取它。

步骤 3：配置 Prometheus

将以下 remote_write 配置块添加到你的 prometheus.yml 中：

复制代码

remote_write:
  - url: "https://YOUR_ES_ENDPOINT/_prometheus/api/v1/write"
    authorization:
      type: ApiKey
      credentials: YOUR_API_KEY

就是这样。Prometheus 会在下一个抓取间隔开始将指标发送到 Elasticsearch。

如果你使用的是 Grafana Alloy 而不是 Prometheus，对应的配置如下：

复制代码

prometheus.remote_write "elasticsearch" {
  endpoint {
    url = "https://YOUR_ES_ENDPOINT/_prometheus/api/v1/write"
    headers = {"Authorization" = "ApiKey YOUR_API_KEY"}
  }
}

将指标路由到不同的数据流

默认情况下，所有指标都会进入 metrics-generic.prometheus-default。你可以通过 URL 中的 dataset 和 namespace 路径段，将不同环境或团队的指标路由到独立的数据流中。

三种 URL 模式如下：

/_prometheus/api/v1/write → 路由到 metrics-generic.prometheus-default
/_prometheus/metrics/{dataset}/api/v1/write → 路由到 metrics-{dataset}.prometheus-default
/_prometheus/metrics/{dataset}/{namespace}/api/v1/write → 路由到 metrics-{dataset}.prometheus-{namespace}

例如，使用 /_prometheus/metrics/infrastructure/production/api/v1/write 会将数据路由到 metrics-infrastructure.prometheus-production。

这在区分生产与测试（staging）指标时非常有用，也可以为不同团队提供各自独立的数据流，并配置独立的生命周期策略。

存储内容

下面是一个 sample 在 Elasticsearch 中的文档示例：

复制代码

{
  "@timestamp": "2026-04-02T10:30:00.000Z",
  "data_stream": {
    "type": "metrics",
    "dataset": "generic.prometheus",
    "namespace": "default"
  },
  "labels": {
    "__name__": "prometheus_http_requests_total",
    "handler": "/api/v1/query",
    "code": "200",
    "instance": "localhost:9090",
    "job": "prometheus"
  },
  "metrics": {
    "prometheus_http_requests_total": 42
  }
}

labels 会被映射为 keyword 字段，用作 TSDS 维度（dimensions）。指标值会存储在 metrics.<metric_name> 下，并带有推断出的 time_series_metric 类型（counter 或 gauge）。

Elasticsearch 会安装一个内置 index template，用于匹配 metrics-.prometheus- ，该模板会配置 TSDS 模式、用于维度的透传容器对象（passthrough dimension container objects），以及 10,000 的字段上限。该字段上限可以通过自定义 component template 进行配置（关于如何使用自定义模板，请参见下面的 custom metric type inference 部分）。你不需要自己创建任何 template 或 mapping。

自定义指标类型推断

指标类型的推断基于命名约定。不符合 Prometheus 命名规范的指标可能会被错误分类。你可以通过创建 metrics-prometheus@custom component template 来覆盖默认行为，并使用自己的 dynamic templates。例如，要将所有 *_counter 指标标记为 counter：

复制代码

{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "counter": {
            "path_match": "metrics.*_counter",
            "mapping": {
              "type": "double",
              "time_series_metric": "counter"
            }
          }
        }
      ]
    }
  }
}

自定义规则会与内置模式进行合并，因此默认规则仍然会作用于你未覆盖的指标。

当前限制

目前仅支持 Remote Write v1。v2 尚未支持，该版本将引入 native histograms 和 exemplars。

staleness 标记（Prometheus 用来表示某条时间序列已消失的特殊 NaN 值）目前尚未被存储，也不会在查询中被识别。

非有限值（NaN、Infinity）会被静默丢弃。

开始使用

Prometheus Remote Write 端点已在 Elasticsearch Serverless 中可用，无需额外配置即可使用。对于本地集群，可以使用 start-local 在几分钟内启动单节点集群。

一旦指标开始写入，你可以使用 ES|QL 查询它们：既可以通过内置的 PROMQL 函数获得与 PromQL 兼容的查询方式，也可以编写原生 ES|QL 查询，将 metrics 与 logs、traces 在同一存储中进行关联分析。

原文：https://www.elastic.co/observability-labs/blog/prometheus-remote-write-elasticsearch