Elasticsearch 数据建模详解：nested vs parent-child

Elasticsearch 数据建模详解：nested vs parent-child（高级工程师必会）

这篇文档只讲 ES 数据建模里最容易踩坑、也是最容易拉开水平差距的两块 ：
nested 和 parent-child（join） 。

目标：让你在面试和真实项目里都能选对模型、讲清原理、说明取舍。

1. 为什么 Elasticsearch 的"对象数组"会坑人？

先看一个直觉正确、但 ES 默认是错的例子。

1.1 业务数据（商品 + 属性）

json 复制代码

{
  "id": "p1",
  "attrs": [
    { "name": "color", "value": "red" },
    { "name": "size",  "value": "L" }
  ]
}

1.2 你想查的条件

color = red AND size = L（同一个商品）

1.3 默认 object 的真实存储方式（重点）

ES 会把数组对象 扁平化 成这样：

text 复制代码

attrs.name  = ["color", "size"]
attrs.value = ["red", "L"]

于是这个查询：

json 复制代码

{
  "bool": {
    "must": [
      { "term": { "attrs.name": "color" } },
      { "term": { "attrs.value": "L" } }
    ]
  }
}

❌ 可能匹配到不存在的组合

（color=red + size=L 并不在同一个对象里）

2. nested：解决"对象内强一致匹配"

2.1 nested 的本质

每个 nested 对象在底层是 独立隐藏文档
查询时通过 nested query 约束"同一个对象内"

👉 代价：
查询更贵，但语义绝对正确

2.2 mapping 定义（nested）

json 复制代码

PUT product
{
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "attrs": {
        "type": "nested",
        "properties": {
          "name":  { "type": "keyword" },
          "value": { "type": "keyword" }
        }
      }
    }
  }
}

2.3 nested 查询示例（核心）

json 复制代码

GET product/_search
{
  "query": {
    "nested": {
      "path": "attrs",
      "query": {
        "bool": {
          "must": [
            { "term": { "attrs.name": "color" } },
            { "term": { "attrs.value": "red" } }
          ]
        }
      }
    }
  }
}

✔️ 只匹配同一个 attrs 对象

2.4 Spring Boot 示例（nested）

java 复制代码

NestedQueryBuilder nestedQuery =
    QueryBuilders.nestedQuery(
        "attrs",
        QueryBuilders.boolQuery()
            .must(QueryBuilders.termQuery("attrs.name", "color"))
            .must(QueryBuilders.termQuery("attrs.value", "red")),
        ScoreMode.None
    );

2.5 nested 的典型使用场景

商品属性（key-value）
标签 + 权重
条件组合必须在同一元素内

2.6 nested 的代价（面试要说）

每个 nested 元素 = 一个隐藏文档
nested 多 → 文档数膨胀
查询需要 join-like 过程 → CPU 成本高

3. parent-child（join）：解决"文档独立生命周期"

3.1 parent-child 解决什么问题？

当你遇到这些情况：

子对象数量非常多
子对象频繁更新
不想因为改一个子对象而重写整个父文档

👉 parent-child 更合适

3.2 mapping 定义（join 类型）

json 复制代码

PUT order
{
  "mappings": {
    "properties": {
      "relation": {
        "type": "join",
        "relations": {
          "order": "order_item"
        }
      }
    }
  }
}

3.3 写入示例

父文档

json 复制代码

PUT order/_doc/1
{
  "relation": "order",
  "orderNo": "O20250001"
}

子文档（必须 routing 到父）

json 复制代码

PUT order/_doc/2?routing=1
{
  "relation": {
    "name": "order_item",
    "parent": "1"
  },
  "skuId": "sku-1",
  "price": 100
}

3.4 parent → child 查询

json 复制代码

GET order/_search
{
  "query": {
    "has_child": {
      "type": "order_item",
      "query": {
        "range": {
          "price": { "gte": 100 }
        }
      }
    }
  }
}

3.5 child → parent 查询

json 复制代码

GET order/_search
{
  "query": {
    "has_parent": {
      "parent_type": "order",
      "query": {
        "term": { "orderNo": "O20250001" }
      }
    }
  }
}

4. nested vs parent-child 正面硬对比（必背）

维度	nested	parent-child
数据存储	同一文档	独立文档
查询性能	较快	更慢
更新成本	高（重写父文档）	低（只改子文档）
子对象数量	少~中	多
查询语义	强一致	强一致
实现复杂度	低	高

5. 真实项目选型口诀（面试直接说）

对象不多、更新不频繁 → nested
子对象很多、更新很频繁 → parent-child
两者都复杂 → 先考虑业务是否真的需要 ES

6. 高频踩坑总结（面试加分）

❌ 忘了 nested，导致条件串味
❌ nested 当关系型 join 用
❌ parent-child 没 routing，数据直接乱
❌ parent-child 当 OLTP 用（更新风暴）

7. 面试官追问时的"标准回答结构"

当被问：

"你为什么用 nested / parent-child？"

你回答顺序：

业务特征（数量 / 更新频率 / 查询方式）
ES 底层机制（扁平化 / join 成本）
性能与维护取舍
为什么不用关系型数据库

8. 一句话总结（背这个）

nested 是"文档内一致性"，
parent-child 是"文档间独立生命周期"。
能用 nested 别上 join，
能不用 ES 就别硬用。