文章三十四:ElasticSearch Script脚本实战

script脚本获取上下文内容：

在 Elasticsearch 中，Painless 脚本根据使用位置不同 ，拥有完全不一样的运行上下文，能直接调用的内置变量、字段取值方式、可用 API 都存在严格区别。日常开发最常用场景分为：查询脚本字段 script_fields、运行时字段 runtime_mappings、分数计算脚本 script_score、聚合脚本、更新文档脚本，本文统一梳理所有取值规范、可用变量、使用禁忌。

场景 1：script_fields 查询临时脚本字段

使用位置 ：_search查询外层，用于查询时临时计算字段返回

可直接使用内置变量
- doc['字段名']：读取结构化字段值
- params：接收外部自定义传入参数
不可直接使用
- 无默认_score、无_doc内置变量
- 不能直接写_source，必须写params._source
执行特点
- 仅作用于查询结果展示，不参与评分、不修改数据
- 计算字段独立返回在script_fields层级，不会被fields:["*"]统一收纳

场景 2：runtime_mappings 运行时映射脚本

使用位置：查询内定义动态字段，等同于临时新增映射字段

可直接使用内置变量
- doc['字段名']：主流取值方式
- emit()：固定语法输出字段结果
执行特点
- 属于 ES 正式识别字段，fields:["*"]可直接查询获取
- 参与排序、聚合、过滤，优先级等同于原生 mapping 字段
- 性能高于 script_fields，适合高频临时字段计算

场景 3：script_score 分数计算脚本（重点高频）

使用位置：替换文档默认相关性评分，自定义排序权重

可直接无前缀使用的内置变量（全部）
1. _score：文档原生匹配相关性分数，最常用
2. doc['字段名']：高性能读取结构化字段
3. _doc：文档元数据对象，可调用内置方法
  - _doc.id()：获取文档 ID
  - _doc.index()：获取所属索引名
  - _doc.version()：获取文档版本号
4. params：接收外部自定义传参
绝对不能直接使用
- 禁止直接写_source，必须使用params._source读取原始数据
设计初衷优先使用_score+doc[]保证算分性能，刻意限制直接读取_source避免查询性能崩盘
示例

json

复制代码

GET index/_search
{
  "query": {
    "script_score": {
      "query": {"match_all": {}},
      "script": {
        "source": "_score + doc['price'].value + _doc.version()",
        "params": {"num":10}
      }
    }
  }
}

场景 4：更新文档脚本 update by query

使用位置：批量更新文档数据

核心取值
- ctx._source：专属直接变量，直接操作文档原始字段
专属内置上下文
- ctx.op：操作类型（index/delete）
- ctx._id：文档 ID
- ctx._index：索引名
语法示例

POST index/_update_by_query
{
"script": {
"source": "ctx._source.price += 10"
}
}

场景 5：聚合查询脚本 aggs script

使用位置：聚合统计内做数值计算

主流取值：doc['字段名'].value
无_score评分变量，仅做数据统计运算
支持params自定义参数，不建议使用_source降低聚合效率

GET _script_language

Elasticsearch 的一个元信息 API，用于查询当前集群支持的所有脚本语言（如 painless、expression）、每种语言可用的执行上下文（如 bucket_aggregation、score、update），以及允许的脚本类型（inline/stored），相当于集群脚本能力的 "说明书" 入口。

Script 脚本应用场景

数据写入（Ingest Pipeline） 在数据写入索引的过程中，通过 Ingest Pipeline 执行脚本，实现字段清洗、格式转换、自动补全（如计算数据哈希、拼接字段值）。
数据更新（Update / Ingest）
- 单条更新：通过 _update API 用脚本修改文档字段值
- 批量更新：通过 _update_by_query 批量对满足条件的文档执行脚本操作
- 也可结合 Ingest Pipeline 实现写入时的动态更新逻辑
数据查询（Query / Template）
- 自定义过滤：通过 script 查询实现复杂的条件过滤（如多字段组合判断、动态阈值过滤）
- 自定义打分：通过 script_score 脚本修改文档相关性评分，实现业务化排序逻辑
- 查询模板：将脚本封装为查询模板，实现参数化、可复用的查询逻辑
数据聚合（aggs） 在聚合分析中使用脚本，实现复杂的分组、计算和转换逻辑，比如：
- 桶脚本（bucket_script）：对聚合结果进行二次计算（如差值、比率）
- 聚合脚本（script）：自定义聚合字段值，实现非结构化数据的统计分析
数据转换（Transform） 在 Elasticsearch Transform 任务中使用脚本，对索引数据进行清洗、转换、聚合后生成新的目标索引，常用于数据仓库建模、宽表构建等场景。
索引重建（Reindex） 在 _reindex 操作中通过脚本修改文档结构、字段值或过滤数据，实现跨索引数据迁移、字段映射变更、数据清洗等需求。

针对上面的应用场景，下面展示了使用的案例。

前置准备：先创建一个测试索引（所有示例通用）

json

复制代码

PUT /test_script
{
  "mappings": {
    "properties": {
      "user_id": {"type": "keyword"},
      "price": {"type": "double"},
      "quantity": {"type": "integer"},
      "order_time": {"type": "date"},
      "tags": {"type": "keyword"}
    }
  },
  "settings": {
    "number_of_shards": 1
  }
}

POST /test_script/_doc
{
  "user_id": "u001",
  "price": 100,
  "quantity": 2,
  "order_time": "2025-01-01T10:00:00Z",
  "tags": ["electronics"]
}

1. 数据写入：Ingest Pipeline 脚本

场景：写入时自动计算总价、添加写入时间

json

复制代码

PUT /_ingest/pipeline/script_write_pipeline
{
  "processors": [
    {
      "script": {
        "source": """
          // 自动计算总价
          ctx.total_price = ctx.price * ctx.quantity;
          // 自动添加写入时间
          ctx.write_time = new Date();
        """
      }
    }
  ]
}

// 测试写入
POST /test_script/_doc?pipeline=script_write_pipeline
{
  "user_id": "u002",
  "price": 200,
  "quantity": 3,
  "order_time": "2025-01-02T14:30:00Z",
  "tags": ["clothes"]
}

2. 数据更新：Update / Ingest 脚本

2.1 单条更新：修改字段值

复制代码

POST /test_script/_update/u001
{
  "script": {
    "source": "ctx._source.price += params.add_price",
    "params": {
      "add_price": 50
    }
  }
}

2.2 批量更新：修改所有文档的总价

复制代码

POST /test_script/_update_by_query
{
  "script": {
    "source": "ctx._source.total_price = ctx._source.price * ctx._source.quantity"
  },
  "query": {
    "match_all": {}
  }
}

3. 数据查询：Query / Template 脚本

3.1 自定义过滤：筛选总价大于 300 的订单

复制代码

GET /test_script/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "script": {
            "script": {
              "source": "doc['price'].value * doc['quantity'].value > 300"
            }
          }
        }
      ]
    }
  }
}

3.2 自定义打分：按总价权重排序

复制代码

GET /test_script/_search
{
  "query": {
    "script_score": {
      "query": {"match_all": {}},
      "script": {
        "source": "_score + doc['price'].value * 0.1"
      }
    }
  }
}

4. 数据聚合：aggs 脚本

场景：按日期分组，计算每组订单的平均单价和总价，再用桶脚本计算差值

复制代码

GET /test_script/_search
{
  "size": 0,
  "aggs": {
    "date_bucket": {
      "date_histogram": {
        "field": "order_time",
        "calendar_interval": "day"
      },
      "aggs": {
        "avg_price": {
          "avg": {"field": "price"}
        },
        "sum_total": {
          "sum": {"script": "doc['price'].value * doc['quantity'].value"}
        },
        "price_diff": {
          "bucket_script": {
            "buckets_path": {
              "avg": "avg_price",
              "sum": "sum_total"
            },
            "script": "params.sum / params.avg"
          }
        }
      }
    }
  }
}

5. 数据转换：Transform 脚本

场景：创建 Transform 任务，将原索引数据按日期聚合，生成统计宽表

复制代码

// 1. 创建目标索引
PUT /test_transform_target
{
  "mappings": {
    "properties": {
      "order_date": {"type": "date"},
      "total_orders": {"type": "integer"},
      "avg_price": {"type": "double"},
      "total_revenue": {"type": "double"}
    }
  }
}

// 2. 创建 Transform 任务
PUT /_transform/order_transform
{
  "source": {
    "index": "test_script"
  },
  "dest": {
    "index": "test_transform_target"
  },
  "pivot": {
    "group_by": [
      {
        "date_histogram": {
          "field": "order_time",
          "calendar_interval": "day"
        }
      }
    ],
    "aggregations": {
      "total_orders": {"value_count": {"field": "user_id"}},
      "avg_price": {"avg": {"field": "price"}},
      "total_revenue": {
        "sum": {
          "script": "doc['price'].value * doc['quantity'].value"
        }
      }
    }
  }
}

// 3. 启动 Transform 任务
POST /_transform/order_transform/_start

6. 索引重建：Reindex 脚本

场景：跨索引迁移数据，同时修改字段名、添加新字段

复制代码

// 目标索引
PUT /test_reindex_target
{
  "mappings": {
    "properties": {
      "id": {"type": "keyword"},
      "amount": {"type": "double"},
      "count": {"type": "integer"},
      "order_date": {"type": "date"},
      "is_high_value": {"type": "boolean"}
    }
  }
}

// 执行 reindex + 脚本转换
POST /_reindex
{
  "source": {
    "index": "test_script"
  },
  "dest": {
    "index": "test_reindex_target"
  },
  "script": {
    "source": """
      // 字段重命名
      ctx._source.id = ctx._source.user_id;
      ctx._source.amount = ctx._source.price;
      ctx._source.count = ctx._source.quantity;
      ctx._source.order_date = ctx._source.order_time;
      // 新增字段：标记高价值订单
      ctx._source.is_high_value = (ctx._source.price * ctx._source.quantity) > 300;
      // 删除原字段
      ctx._source.remove('user_id');
      ctx._source.remove('price');
      ctx._source.remove('quantity');
    """
  }
}

我们也可以通过下面的方式测试执行一下脚本语言：

复制代码

GET _scripts/painless/_execute
{
  "script":{
    "source":"""
    params.a*params.b
    """,
    "params":{
      "a":1,
      "b":2
    }
  }
}

painess官方文档：https://www.elastic.co/guide/en/elasticsearch/painless/8.5/painless-update-by-query-context.html