Elasticsearch DSL查询语法

一、简介

Elasticsearch DSL 提供了极其丰富的查询功能，从简单的全文搜索到复杂的地理空间查询、嵌套文档查询和聚合分析。DSL 查询的基本结构包含以下几个主要部分：

bash 复制代码

{
  "query": { ... },      // 查询条件（核心部分）
  "aggs": { ... },       // 聚合分析
  "sort": [ ... ],       // 排序规则
  "from": 0,             // 分页起始位置
  "size": 10,            // 返回结果数量
  "_source": { ... },    // 返回字段控制
  "script_fields": { ... }, // 脚本字段
  "highlight": { ... },  // 高亮显示
  "explain": true,       // 是否返回评分解释
  "timeout": "10s"       // 超时设置
}

二、query 查询条件

Elasticsearch 的 DSL 查询语言提供了丰富的查询类型，可以满足各种搜索需求。query 查询主要分为以下几大类：

全文查询（Full Text Queries）
词项级查询（Term-level Queries）
复合查询（Compound Queries）
地理位置查询（Geo Queries）
特殊查询（Specialized Queries）
嵌套和父子文档查询

2.1 全文查询（Full Text Queries）

用于在文本字段上执行全文搜索，会对查询字符串进行分析处理。

match 查询
- 用途：最基本的全文查询类型，会对查询文本进行分词处理。
bash 复制代码
```
{
  "query": {
    "match": {
      "content": {
        "query": "Elasticsearch tutorial",
        "operator": "and",  // 必须包含所有词项
        "minimum_should_match": "75%",  // 至少匹配75%的词项
        "fuzziness": "AUTO",  // 自动模糊匹配
        "analyzer": "standard"
      }
    }
  }
}
```
- 参数说明：
  - operator：默认为 or，可设为 and 要求所有词项匹配。
  - minimum_should_match：最小匹配词项数或百分比。
  - fuzziness：模糊匹配级别（"AUTO"或0-2）。
  - analyzer：指定分析器。

match_phrase 查询

用途：精确匹配整个短语，保持词项顺序。

bash 复制代码

{
  "query": {
    "match_phrase": {
      "title": {
        "query": "quick brown fox",
        "slop": 2,  // 允许词项间最多间隔2个词
        "analyzer": "english"
      }
    }
  }
}

参数说明：
- slop：允许的词项间隔距离（默认0）。
- analyzer：指定分析器。

match_phrase_prefix 查询
- 用途：短语前缀匹配，最后一个词项做前缀匹配。
bash 复制代码
```
{
  "query": {
    "match_phrase_prefix": {
      "title": {
        "query": "Elast sear",
        "max_expansions": 10,  // 最多扩展10个前缀匹配项
        "slop": 1
      }
    }
  }
}
```
- 参数说明：
  - max_expansions：前缀扩展的最大数量(默认50)。
  - slop：允许的词项间隔距离。

multi_match 查询

用途：在多个字段上执行全文搜索。

bash 复制代码

{
  "query": {
    "multi_match": {
      "query": "Elasticsearch",
      "fields": ["title^3", "content", "abstract^2"],  // 字段权重设置
      "type": "best_fields",  // 最佳匹配字段得分
      "tie_breaker": 0.3  // 其他匹配字段得分权重
    }
  }
}

query_string 查询

用途：支持Lucene查询语法。

bash 复制代码

{
  "query": {
    "query_string": {
      "default_field": "content",
      "query": "(Elasticsearch AND tutorial) OR (Kibana AND guide)",
      "default_operator": "OR",
      "allow_leading_wildcard": false,
      "fuzziness": "AUTO"
    }
  }
}

常用语法元素：
- field:value 指定字段搜索
- AND/OR/NOT 逻辑操作
- +必须包含 -必须不包含
- * 通配符
- ~ 模糊搜索
- [TO] 范围搜索

simple_query_string 查询

用途：更健壮的 query_string 简化版。

bash 复制代码

{
  "query": {
    "simple_query_string": {
      "query": "Elasticsearch +tutorial -beginner",
      "fields": ["title", "content"],
      "default_operator": "and",
      "analyze_wildcard": true
    }
  }
}

2.2 词项级查询（Term-level Queries）

term 查询

用途：精确匹配单个词项。

bash 复制代码

{
  "query": {
    "term": {
      "status": {
        "value": "published",
        "boost": 1.5  // 权重提升
      }
    }
  }
}

terms 查询

用途：匹配多个精确词项，相当于 SQL 中的 IN 查询。

bash 复制代码

{
  "query": {
    "terms": {
      "tags": ["java", "search", "database"],
      "boost": 2.0
    }
  }
}

range 查询

用途：范围查询，支持数值、日期和IP地址。

bash 复制代码

{
  "query": {
    "range": {
      "price": {
        "gte": 100,
        "lt": 1000,
        "boost": 2.0,
        "format": "yyyy-MM-dd"  // 日期格式
      }
    }
  }
}

exists 查询
- 用途：查找包含指定字段的文档。
bash 复制代码
```
{
  "query": {
    "exists": {
      "field": "description"
    }
  }
}
```
prefix 查询
- 用途：前缀匹配查询。
bash 复制代码
```
{
  "query": {
    "prefix": {
      "user.id": {
        "value": "ki",
        "rewrite": "constant_score"  // 重写方法
      }
    }
  }
}
```
- 参数说明：
  - rewrite：控制如何重写查询，可选值：
    - constant_score (默认)
    - constant_score_boolean
    - scoring_boolean
    - top_terms_N (N为数字)

wildcard 查询

用途：通配符查询，支持 * (匹配多个字符) 和 ? (匹配单个字符)。

bash 复制代码

{
  "query": {
    "wildcard": {
      "user.id": {
        "value": "k*y",
        "boost": 1.0,
        "rewrite": "scoring_boolean"
      }
    }
  }
}

regexp 查询
- 用途：正则表达式匹配。
bash 复制代码
```
{
  "query": {
    "regexp": {
      "user.id": {
        "value": "k.*y",
        "flags": "ALL",
        "max_determinized_states": 10000
      }
    }
  }
}
```
- 支持的标志(flags)：
  - ALL：启用所有可选操作符
  - COMPLEMENT：允许使用 ~ 取反
  - INTERVAL：允许使用 <> 间隔
  - INTERSECTION ：允许使用 & 表示与操作
  - ANYSTRING：允许使用 @ 匹配任何字符串

fuzzy 查询

用途：模糊查询，允许一定程度的拼写错误。

bash 复制代码

{
  "query": {
    "fuzzy": {
      "title": {
        "value": "Elastcsearch",
        "fuzziness": "2",  // 允许2个字符的差异
        "prefix_length": 3  // 前3个字符必须精确匹配
      }
    }
  }
}

fuzziness 取值：
- AUTO：基于词项长度自动确定
- 0：不允许差异
- 1：允许1个字符差异
- 2：允许2个字符差异

2.3 复合查询（Compound Queries）

bool 查询

用途：组合多个查询条件。

bash 复制代码

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Elasticsearch" } },
        { "range": { "date": { "gte": "2023-01-01" } } }
      ],
      "should": [
        { "match": { "content": "tutorial" } },
        { "term": { "category": "technology" } }
      ],
      "must_not": [
        { "term": { "status": "archived" } }
      ],
      "filter": [
        { "term": { "language": "english" } }
      ],
      "minimum_should_match": 1,
      "boost": 1.0
    }
  }
}

各子句详解

子句	描述	影响评分	使用场景
must	必须匹配	是	主要查询条件
should	应该匹配	是	次要条件或增强相关性
must_not	必须不匹配	否	排除条件
filter	必须匹配	否	过滤条件

boosting 查询

用途：降低某些文档的得分。

bash 复制代码

{
  "query": {
    "boosting": {
      "positive": {
        "match": { "content": "apple" }
      },
      "negative": {
        "match": { "content": "pie tart fruit" }
      },
      "negative_boost": 0.5,
      "boost": 1.0
    }
  }
}

constant_score 查询

用途：固定分数查询，通常与 filter 一起使用。

bash 复制代码

{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "status": "active" }
      },
      "boost": 1.2
    }
  }
}

dis_max 查询

用途：取子查询中的最高分，适用于"最佳字段"匹配场景。

bash 复制代码

{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "title": "Elasticsearch" } },
        { "match": { "content": "Elasticsearch" } },
        { "match": { "abstract": "Elasticsearch" } }
      ],
      "tie_breaker": 0.7,
      "boost": 1.2
    }
  }
}

2.4 地理位置查询（Geo Queries）

geo_distance 查询

用途：距离范围内搜索。

bash 复制代码

{
  "query": {
    "geo_distance": {
      "distance": "10km",
      "distance_type": "arc",
      "location": {
        "lat": 40.715,
        "lon": -73.988
      },
      "validation_method": "STRICT"
    }
  }
}

geo_bounding_box 查询

用途：矩形范围内搜索。

bash 复制代码

{
  "query": {
    "geo_bounding_box": {
      "location": {
        "top_left": {
          "lat": 40.73,
          "lon": -74.1
        },
        "bottom_right": {
          "lat": 40.01,
          "lon": -71.12
        }
      },
      "validation_method": "COERCE"
    }
  }
}

geo_polygon 查询

用途：多边形范围内搜索

bash 复制代码

{
  "query": {
    "geo_polygon": {
      "location": {
        "points": [
          { "lat": 40.73, "lon": -74.1 },
          { "lat": 40.01, "lon": -71.12 },
          { "lat": 50.56, "lon": -90.58 }
        ]
      },
      "validation_method": "IGNORE_MALFORMED"
    }
  }
}

2.5 特殊查询（Specialized Queries）

more_like_this 查询

用途：查找相似文档

bash 复制代码

{
  "query": {
    "more_like_this": {
      "fields": ["title", "content"],
      "like": [
        {
          "_index": "articles",
          "_id": "1"
        },
        {
          "_index": "articles",
          "_id": "2"
        }
      ],
      "min_term_freq": 1,
      "max_query_terms": 25,
      "min_doc_freq": 2
    }
  }
}

script 查询

用途：脚本查询。

bash 复制代码

{
  "query": {
    "script": {
      "script": {
        "source": """
          double price = doc['price'].value;
          double discount = params.discount;
          return price * (1 - discount) > params.min_price;
        """,
        "params": {
          "discount": 0.1,
          "min_price": 50
        }
      }
    }
  }
}

pinned 查询

用途：固定某些文档在结果顶部

bash 复制代码

{
  "query": {
    "pinned": {
      "ids": ["1", "2", "3"],
      "organic": {
        "match": {
          "description": "Elasticsearch"
        }
      }
    }
  }
}

2.6 嵌套和父子文档查询

nested 查询

用途：嵌套对象查询。

bash 复制代码

{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "match": { "comments.author": "John" } },
            { "range": { "comments.date": { "gte": "2023-01-01" } } }
          ]
        }
      },
      "score_mode": "max",
      "inner_hits": {
        "size": 5,
        "name": "latest_comments"
      }
    }
  }
}

has_child 查询

用途：子文档查询。

bash 复制代码

{
  "query": {
    "has_child": {
      "type": "comment",
      "score_mode": "sum",
      "min_children": 2,
      "max_children": 10,
      "query": {
        "match": { "content": "great" }
      },
      "inner_hits": {}
    }
  }
}

has_parent 查询

用途：父文档查询。

bash 复制代码

{
  "query": {
    "has_parent": {
      "parent_type": "blog",
      "score": true,
      "query": {
        "term": { "category": "technology" }
      },
      "inner_hits": {
        "name": "parent_blog"
      }
    }
  }
}

三、aggs 聚合分析

Elasticsearch 的聚合分析功能提供了强大的数据统计和分析能力，能够对数据进行分组、统计和计算各种指标。聚合操作可以嵌套使用，构建复杂的数据分析管道。

3.1 聚合的三种类型

指标聚合（Metric Aggregations）：计算数值指标，如 sum、avg、max 等。
桶聚合（Bucket Aggregations）：将文档分组到不同的桶中。
管道聚合（Pipeline Aggregations）：对其他聚合的结果进行再聚合。

3.2 聚合基本结构

bash 复制代码

{
  "aggs": {                        // 也可以使用"aggregations"
    "agg_name": {                  // 自定义聚合名称
      "agg_type": {                // 聚合类型
        "agg_body": ...            // 聚合体
      }
    }
  }
}

3.3 指标聚合

基本指标聚合

avg - 平均值

bash 复制代码

{
  "aggs": {
    "avg_price": {
      "avg": { "field": "price" }
    }
  }
}

sum - 求和

bash 复制代码

{
  "aggs": {
    "total_sales": {
      "sum": { "field": "sales" }
    }
  }
}

max/min - 最大/最小值

bash 复制代码

{
  "aggs": {
    "max_age": {
      "max": { "field": "age" }
    }
  }
}

stats - 基本统计

bash 复制代码

{
  "aggs": {
    "price_stats": {
      "stats": { "field": "price" }
    }
  }
}

extended_stats - 扩展统计

bash 复制代码

{
  "aggs": {
    "price_extended_stats": {
      "extended_stats": { "field": "price" }
    }
  }
}

cardinality - 基数统计（去重计数）

bash 复制代码

{
  "aggs": {
    "unique_users": {
      "cardinality": { "field": "user_id" }
    }
  }
}

高级指标聚合

percentiles - 百分位数

bash 复制代码

{
  "aggs": {
    "load_time_percentiles": {
      "percentiles": {
        "field": "load_time",
        "percents": [95, 99, 99.9]
      }
    }
  }
}

percentile_ranks - 百分位排名

bash 复制代码

{
  "aggs": {
    "load_time_ranks": {
      "percentile_ranks": {
        "field": "load_time",
        "values": [500, 1000]
      }
    }
  }
}

top_hits - 返回每组顶部文档

bash 复制代码

{
  "aggs": {
    "top_tags": {
      "terms": { "field": "tags" },
      "aggs": {
        "top_tag_hits": {
          "top_hits": {
            "size": 1,
            "sort": [{ "date": { "order": "desc" } }]
          }
        }
      }
    }
  }
}

3.4 桶聚合

基本桶聚合

terms - 按词项分组

bash 复制代码

{
  "aggs": {
    "genres": {
      "terms": {
        "field": "genre",
        "size": 10,
        "order": { "_count": "desc" }
      }
    }
  }
}

range - 按范围分组

bash 复制代码

{
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 50 },
          { "from": 50, "to": 100 },
          { "from": 100 }
        ]
      }
    }
  }
}

date_range - 日期范围分组

bash 复制代码

{
  "aggs": {
    "date_ranges": {
      "date_range": {
        "field": "date",
        "format": "yyyy-MM-dd",
        "ranges": [
          { "to": "now-10d/d" },
          { "from": "now-10d/d", "to": "now" },
          { "from": "now" }
        ]
      }
    }
  }
}

histogram - 直方图

bash 复制代码

{
  "aggs": {
    "prices": {
      "histogram": {
        "field": "price",
        "interval": 50,
        "extended_bounds": {
          "min": 0,
          "max": 500
        }
      }
    }
  }
}

date_histogram - 日期直方图

bash 复制代码

{
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0
      }
    }
  }
}

高级桶聚合

nested - 嵌套对象聚合

bash 复制代码

{
  "aggs": {
    "comments": {
      "nested": { "path": "comments" },
      "aggs": {
        "by_user": {
          "terms": { "field": "comments.user" }
        }
      }
    }
  }
}

filter - 过滤后聚合

bash 复制代码

{
  "aggs": {
    "high_value": {
      "filter": { "range": { "price": { "gte": 100 } } },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } }
      }
    }
  }
}

filters - 多过滤器聚合

bash 复制代码

{
  "aggs": {
    "messages": {
      "filters": {
        "filters": {
          "errors": { "term": { "level": "error" } },
          "warnings": { "term": { "level": "warning" } }
        }
      }
    }
  }
}

3.5 管道聚合

基础管道聚合

avg_bucket - 计算桶平均值

bash 复制代码

{
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": { "sum": { "field": "price" } }
      }
    },
    "avg_monthly_sales": {
      "avg_bucket": {
        "buckets_path": "sales_per_month>sales"
      }
    }
  }
}

derivative - 计算导数

bash 复制代码

{
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": { "sum": { "field": "price" } },
        "sales_deriv": {
          "derivative": { "buckets_path": "sales" }
        }
      }
    }
  }
}

高级管道聚合

cumulative_sum - 累计和

bash 复制代码

{
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": { "sum": { "field": "price" } },
        "cumulative_sales": {
          "cumulative_sum": { "buckets_path": "sales" }
        }
      }
    }
  }
}

moving_avg - 移动平均

bash 复制代码

{
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": { "sum": { "field": "price" } },
        "moving_avg": {
          "moving_avg": { "buckets_path": "sales" }
        }
      }
    }
  }
}

四、sort 排序

排序是 Elasticsearch 搜索中非常重要的功能，它决定了返回结果的顺序。

4.1 基本排序语法

简单排序

bash 复制代码

{
  "query": { ... },
  "sort": [
    { "field_name": { "order": "desc" } }
  ]
}

多字段排序

bash 复制代码

{
  "query": { ... },
  "sort": [
    { "price": { "order": "asc" } },
    { "date": { "order": "desc" } }
  ]
}

4.2 排序类型

字段值排序
- 最常见的排序方式，基于字段值排序。
  bash 复制代码
```
{
  "sort": [
    { "price": { "order": "asc" } }
  ]
}
```
特殊排序
- _score：按相关性评分排序
  bash 复制代码
```
{
  "query": { "match": { "title": "elasticsearch" } },
  "sort": [
    "_score",
    { "date": "desc" }
  ]
}
```
- _doc：按索引顺序排序（性能最高）
  bash 复制代码
```
{
  "sort": "_doc"
}
```
- _shard_doc：按分片文档顺序排序
  bash 复制代码
```
{
  "sort": "_shard_doc"
}
```

多值字段排序

当字段有多个值时，可以指定如何选择排序值：

bash 复制代码

{
  "sort": [
    {
      "dates": {
        "order": "asc",
        "mode": "min",  // min/max/avg/sum/median
        "nested": {
          "path": "dates",
          "filter": { "range": { "dates.date": { "gte": "now-1y/y" } } }
        }
      }
    }
  ]
}

4.3 排序模式（mode）

当字段是多值字段时，需要指定使用哪个值进行排序：

模式	描述
min	使用最小值排序
max	使用最大值排序
sum	使用所有值的和排序
avg	使用平均值排序
median	使用中位数排序

示例：

bash 复制代码

{
  "sort": [
    {
      "prices": {
        "order": "asc",
        "mode": "avg"
      }
    }
  ]
}

4.4 地理距离排序

基本语法

bash 复制代码

{
  "sort": [
    {
      "_geo_distance": {
        "field": "location",
        "points": [ { "lat": 40.715, "lon": -73.988 } ],
        "order": "asc",
        "unit": "km",
        "distance_type": "arc"
      }
    }
  ]
}

多地点排序

bash 复制代码

{
  "sort": [
    {
      "_geo_distance": {
        "location": [ "40.715,-73.988", "41.602,-73.087" ],
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

4.5 脚本排序

使用自定义脚本进行排序：

bash 复制代码

{
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": "doc['price'].value * params.factor",
        "params": {
          "factor": 1.1
        }
      },
      "order": "desc"
    }
  }
}

4.6 嵌套对象排序

基本嵌套排序

bash 复制代码

{
  "sort": [
    {
      "comments.date": {
        "order": "desc",
        "nested": {
          "path": "comments"
        }
      }
    }
  ]
}

带过滤的嵌套排序

bash 复制代码

{
  "sort": [
    {
      "comments.date": {
        "order": "asc",
        "nested": {
          "path": "comments",
          "filter": {
            "term": { "comments.verified": true }
          }
        }
      }
    }
  ]
}

五、_source 返回字段控制

_source 参数是 Elasticsearch 查询 DSL 中一个非常重要的功能，它控制着查询结果中原始文档（source document）的返回方式。合理使用 _source 可以显著优化查询性能并减少网络传输量。

5.1 _source 基础概念

_source 是 Elasticsearch 存储的原始 JSON 文档。默认情况下，当你执行搜索时，Elasticsearch 会返回完整的 _source 文档。

5.2 为什么需要控制 _source

减少网络传输：只返回需要的字段。
提高查询性能：减少数据序列化/反序列化开销。
安全性：隐藏敏感字段。
灵活性：可以重命名字段或添加脚本字段。

5.3 _source 的基本用法

禁用 _source 返回

bash 复制代码

{
  "_source": false,
  "query": {
    "match_all": {}
  }
}

返回特定字段

bash 复制代码

{
  "_source": ["field1", "field2"],
  "query": {
    "match_all": {}
  }
}

使用通配符

bash 复制代码

{
  "_source": ["user.*", "*.id"],
  "query": {
    "match_all": {}
  }
}

5.4 高级 _source 控制

包含/排除模式

bash 复制代码

{
  "_source": {
    "includes": ["*.name", "user.*"],
    "excludes": ["user.password", "*.secret"]
  },
  "query": {
    "match_all": {}
  }
}

嵌套字段控制

bash 复制代码

{
  "_source": {
    "includes": ["user.name", "comments.text"],
    "excludes": ["user.email", "comments.ip"]
  },
  "query": {
    "match_all": {}
  }
}

六、script_fields 脚本字段

script_fields 是 Elasticsearch 查询 DSL 中一个强大的功能，它允许你在查询结果中动态计算并返回新的字段值，而不需要这些字段实际存储在文档中。

6.1 script_fields 基础

基本语法结构

bash 复制代码

{
  "query": { ... },
  "script_fields": {
    "field_name": {
      "script": {
        "source": "script_source",
        "params": { ... }
      }
    }
  }
}

核心特点
- 运行时计算：在查询时动态计算字段值。
- 不改变存储：不影响原始文档，只影响返回结果。
- 灵活性强：可以使用 Painless 等脚本语言编写复杂逻辑。
- 性能考虑：脚本执行会增加查询开销。

6.2 基本用法

简单计算字段

bash 复制代码

{
  "query": { "match_all": {} },
  "script_fields": {
    "discounted_price": {
      "script": {
        "source": "doc['price'].value * 0.9"
      }
    }
  }
}

使用参数

bash 复制代码

{
  "query": { "match_all": {} },
  "script_fields": {
    "final_price": {
      "script": {
        "source": "doc['price'].value * params.discount",
        "params": {
          "discount": 0.85
        }
      }
    }
  }
}

多字段计算

bash 复制代码

{
  "query": { "match_all": {} },
  "script_fields": {
    "total_value": {
      "script": {
        "source": "doc['price'].value * doc['quantity'].value"
      }
    }
  }
}

6.3 高级用法

条件逻辑

bash 复制代码

{
  "script_fields": {
    "price_category": {
      "script": {
        "source": """
          double price = doc['price'].value;
          if (price < 50) return 'low';
          else if (price < 200) return 'medium';
          else return 'high';
        """
      }
    }
  }
}

处理数组字段

bash 复制代码

{
  "script_fields": {
    "avg_rating": {
      "script": {
        "source": """
          if (doc['ratings'].size() == 0) return 0;
          double sum = 0;
          for (rating in doc['ratings']) {
            sum += rating;
          }
          return sum / doc['ratings'].size();
        """
      }
    }
  }
}

日期处理

bash 复制代码

{
  "script_fields": {
    "days_since_post": {
      "script": {
        "source": """
          ChronoUnit.DAYS.between(
            doc['post_date'].value,
            Instant.now()
          )
        """,
        "lang": "painless"
      }
    }
  }
}

七、highlight 高亮显示

高亮(highlight)是 Elasticsearch 中一个非常有用的功能，它能够将查询匹配到的关键词在返回的文本内容中标记出来，便于用户快速定位匹配内容。

7.1 基础用法

基本语法结构

bash 复制代码

{
  "query": { ... },
  "highlight": {
    "fields": {
      "field_name": { ... }
    }
  }
}

简单示例

bash 复制代码

{
  "query": {
    "match": { "content": "Elasticsearch" }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

7.2 核心配置参数

高亮标签设置

bash 复制代码

{
  "highlight": {
    "pre_tags": ["<em>"],
    "post_tags": ["</em>"],
    "fields": {
      "content": {}
    }
  }
}

高亮片段控制
bash 复制代码
```
{
  "highlight": {
    "fields": {
      "content": {
        "fragment_size": 150,
        "number_of_fragments": 3,
        "no_match_size": 150
      }
    }
  }
}
```
- fragment_size：每个高亮片段的大小（字符数）
- number_of_fragments：返回的最大片段数（0表示返回整个字段）
- no_match_size：没有匹配时返回的字段长度

高亮策略

bash 复制代码

{
  "highlight": {
    "fields": {
      "content": {
        "type": "plain",  // 或"fvh"(fast vector highlighter),"unified"
        "boundary_scanner": "sentence"
      }
    }
  }
}

7.3 高级 highlight 功能

多字段高亮

bash 复制代码

{
  "highlight": {
    "fields": {
      "title": {},
      "content": {
        "number_of_fragments": 5
      }
    }
  }
}

不同字段不同标签

bash 复制代码

{
  "highlight": {
    "pre_tags": ["<strong>"],
    "post_tags": ["</strong>"],
    "fields": {
      "title": {
        "pre_tags": ["<b>"],
        "post_tags": ["</b>"]
      },
      "content": {}
    }
  }
}

短语高亮优化

bash 复制代码

{
  "query": {
    "match_phrase": { "content": "Elasticsearch tutorial" }
  },
  "highlight": {
    "fields": {
      "content": {
        "phrase_limit": 10,
        "highlight_query": {
          "bool": {
            "should": [
              { "match": { "content": "Elasticsearch" } },
              { "match": { "content": "tutorial" } }
            ]
          }
        }
      }
    }
  }
}