Elasticsearch 全文检索与过滤

一、本地或云端集群

bash 复制代码
# 一条命令拉起 ES + Kibana(8.x 以上)
curl -fsSL https://elastic.co/start-local | sh

浏览器打开 http://localhost:5601,进入 Kibana → DevTools → Console,后续 JSON 请求直接粘贴即可执行。

云端 / Serverless 用户:只要有 superuser 或开发者角色,同样复制即可。

二、映射与嵌套字段

2.1.创建索引 cooking_blog

bash 复制代码
PUT /cooking_blog

2.3.完整映射(含嵌套 ingredients)

bash 复制代码
PUT /cooking_blog/_mapping
{
  "properties": {
    "title":       { "type": "text", "fields": {"keyword": {"type": "keyword","ignore_above":256 }}},
    "description": { "type": "text", "fields": {"keyword": {"type": "keyword"}}},
    "author":      { "type": "text", "fields": {"keyword": {"type": "keyword"}}},
    "date":        { "type": "date", "format": "yyyy-MM-dd"},
    "category":    { "type": "text", "fields": {"keyword": {"type": "keyword"}}},
    "tags":        { "type": "text", "fields": {"keyword": {"type": "keyword"}}},
    "rating":      { "type": "float"},

    "ingredients": {                         # ⬅︎ 配料表
      "type": "nested",
      "properties": {
        "name":     { "type": "text", "fields": {"keyword": {"type": "keyword"}}},
        "quantity": { "type": "float"},
        "unit":     { "type": "keyword"}
      }
    }
  }
}
  • text + keyword 多字段:同时支持分词检索与精确过滤
  • nested:让配料成为独立文档,便于高精度查询

三、批量导入示例数据

bash 复制代码
POST /cooking_blog/_bulk?refresh=wait_for
{ "index": { "_id": "1" } }
{"title":"Perfect Pancakes: A Fluffy Breakfast Delight","description":"Learn the secrets...","author":"Maria Rodriguez","date":"2023-05-01","category":"Breakfast","tags":["pancakes","breakfast","easy recipes"],"rating":4.8,"ingredients":[{"name":"flour","quantity":200,"unit":"g"},{"name":"buttermilk","quantity":250,"unit":"ml"}]}
{ "index": { "_id": "2" } }
{"title":"Spicy Thai Green Curry: A Vegetarian Adventure","description":"Dive into the flavors...","author":"Liam Chen","date":"2023-05-05","category":"Main Course","tags":["thai","vegetarian","curry","spicy"],"rating":4.6}
{ "index": { "_id": "3" } }
{"title":"Vegan Chocolate Avocado Mousse","description":"Discover the magic of avocado...","author":"Alex Green","date":"2023-05-15","category":"Dessert","tags":["vegan","chocolate","avocado"],"rating":4.5}

四、match / match_phrase / multi_match

4.1. match ------ 标准全文搜索

bash 复制代码
GET /cooking_blog/_search
{
  "_source": ["title","description"],
  "query": {
    "match": {
      "description": "fluffy pancakes"
    }
  }
}
  • 默认 OR:任一 term 命中即可
  • 可加 "operator": "and" 强制全部出现

4.2. match_phrase ------ 顺序与距离

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "match_phrase": {
      "description": {
        "query": "Thai green curry",
        "slop": 2            # 允许错位 2 个 term
      }
    }
  }
}

4.3. multi_match ------ 多字段一次搞定

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "multi_match": {
      "query":  "vegetarian curry",
      "type":   "cross_fields",
      "fields": ["title^3", "description^2", "tags"]
    }
  }
}
  • cross_fields:把多个字段视作一个大字段,适合多 field 同义词
  • ^ 权重:标题 > 描述 > 标签

4.4. simple_query_string ------ 用户友好语法

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "simple_query_string": {
      "query": "\"green curry\" | \"thai curry\" -spicy",
      "fields": ["title","description"]
    }
  }
}

五、精准过滤:term / range / exists

5.1. 类别等值

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "term": {
      "category.keyword": "Dessert"
    }
  }
}

5.2. 评分区间

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "range": {
      "rating": { "gte": 4.7 }
    }
  }
}

5.3. 最近 30 天

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "range": {
      "date": { "gte": "now-30d/d" }
    }
  }
}

过滤不会计分,执行更快,可被缓存。

六、bool、boosting、function_score

6.1. bool------最常用的组合

bash 复制代码
GET /cooking_blog/_search
{
  "_source": ["title","tags","rating","date"],
  "query": {
    "bool": {
      "must": [
        { "term":  { "tags.keyword": "vegetarian" } },
        { "range": { "rating": { "gte": 4.5 } } }
      ],
      "should": [
        { "term":  { "category.keyword": "Main Course" } },
        { "range": { "date": { "gte": "now-1M/d" } } }
      ],
      "must_not": [
        { "term": { "category.keyword": "Dessert" } }
      ],
      "minimum_should_match": 1
    }
  }
}

6.2. boosting------惩罚不想要的命中

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "boosting": {
      "positive": { "match": { "tags": "chocolate" } },
      "negative": { "term":  { "tags.keyword": "vegan" } },
      "negative_boost": 0.3
    }
  }
}

6.3. function_score------按时间、评分衰减

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "function_score": {
      "query": { "match": { "description": "curry" } },
      "functions": [
        {
          "gauss": {
            "date": { "origin": "now", "scale": "30d", "decay": 0.5 }
          }
        },
        {
          "field_value_factor": {
            "field": "rating",
            "modifier": "sqrt"
          }
        }
      ],
      "boost_mode": "sum"
    }
  }
}

七、高亮、排序与分页

bash 复制代码
GET /cooking_blog/_search
{
  "from": 0, "size": 5,
  "sort": [
    { "rating": { "order": "desc" } },
    { "date":   { "order": "desc" } }
  ],
  "query": { "match": { "description": "chocolate" } },
  "highlight": {
    "pre_tags": ["<em>"], "post_tags": ["</em>"],
    "fields": { "description": {} }
  }
}
  • 深分页 :改用 search_afterscroll
  • 高亮 :默认使用字段分析器,需原文保护可用 term_vectorstored_fields

八、为前端做 Facet

bash 复制代码
GET /cooking_blog/_search
{
  "size": 0,
  "aggs": {
    "by_category": {          "terms": { "field": "category.keyword" } },
    "tag_top10":   {          "terms": { "field": "tags.keyword", "size": 10 } },
    "rating_stats":{          "stats": { "field": "rating" } },
    "monthly":     {
      "date_histogram": {
        "field": "date", "calendar_interval": "month"
      }
    }
  }
}
  • 用于显示"分类计数""热门标签""评分分布""月活跃度"等

九、ingredients 配料表

bash 复制代码
GET /cooking_blog/_search
{
  "query": {
    "nested": {
      "path": "ingredients",
      "query": {
        "bool": {
          "must": [
            { "match": { "ingredients.name": "flour" } },
            { "term":  { "ingredients.unit": "g" } }
          ]
        }
      },
      "inner_hits": { "size": 3 }
    }
  }
}

十、性能 & 相关性调优清单

场景 调优要点
常查询少更新 调高 refresh_interval,合并段,降低写放大
过滤维度固定 建议放入 filter,并开启 query.cache(ES 自动处理)
深分页卡顿 使用 search_after 或 Elasticsearch v8+ 的 point in time
高基数排序 为字段启用 doc_values,或设置 index.sort 排序索引
同义词 / 停用词 自定义 analyzer,并分离索引时与查询时同义词策略
多语言 每种语言一个子字段,如 title.entitle.fr,查询时 multi_match
相关推荐
星月昭铭11 小时前
Spring AI集成Elasticsearch向量检索时filter过滤失效问题排查与解决方案
人工智能·spring boot·spring·elasticsearch·ai
陈煜的博客12 小时前
elasticSearch 增删改查 java api
java·大数据·elasticsearch
Hello.Reader12 小时前
Rust × Elasticsearch官方 `elasticsearch` crate 上手指南
elasticsearch·rust·jenkins
踏过山河,踏过海12 小时前
Django自带的加密算法
数据库·django·sqlite
feuiw12 小时前
django-3模型操作
python·django
计算机毕设定制辅导-无忧学长12 小时前
InfluxDB 与 Python 框架结合:Django 应用案例(一)
python·django·sqlite
Freed&14 小时前
Elasticsearch 从入门到精通:术语、索引、分片、读写流程与面试高频题一文搞懂
大数据·elasticsearch·面试
noravinsc14 小时前
django 按照外键排序
数据库·django·sqlite
feuiw15 小时前
django-4事务
python·django