properties
https://www.elastic.co/guide/en/elasticsearch/reference/8.8/properties.html
嵌套对象
https://www.elastic.co/guide/en/elasticsearch/reference/8.8/object.html
创建索引
javascript
curl --location --request PUT 'https://myhost/order_index' \
--header 'Content-Type: application/json' \
--data '{
"mappings": {
"dynamic": "false",
"properties": {
"orderId": {
"type": "keyword"
},
"orderItems" : {
"properties": {
"itemType": {
"type": "keyword"
},
"itemName": {
"type": "keyword"
}
}
}
}
}
}'
写数据
javascript
curl --location --request PUT 'https://myhost/order_index/_doc/1' \
--header 'Content-Type: application/json' \
--data '{
"orderId": "1_1",
"orderItems": [{
"itemType": "food",
"itemName": "egg"
},
{
"itemType": "clothes",
"itemName": "T-shirt"
}
]
}'
curl --location --request PUT 'https://myhost/order_index/_doc/2' \
--header 'Content-Type: application/json' \
--data '{
"orderId": "2_2",
"orderItems": [{
"itemType": "food",
"itemName": "pork"
},
{
"itemType": "poultryEggs",
"itemName": "egg"
}
]
}'
读数据
javascript
curl --location --request GET 'https://myhost/order_index/_search' \
--header 'Content-Type: application/json' \
--data '{
"query":{
"bool":{
"must":[
{
"match":{
"orderItems.itemType":"food"
}
},
{
"match":{
"orderItems.itemName":"egg"
}
}
]
}
}
}'
查询结果
javascript
{
"took": 763,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.723315,
"hits": [
{
"_index": "order_index",
"_type": "_doc",
"_id": "2",
"_score": 0.723315,
"_source": {
"orderId": "2_2",
"orderItems": [
{
"itemType": "food",
"itemName": "pork"
},
{
"itemType": "poultryEggs",
"itemName": "egg"
}
]
}
},
{
"_index": "order_index",
"_type": "_doc",
"_id": "1",
"_score": 0.723315,
"_source": {
"orderId": "1_1",
"orderItems": [
{
"itemType": "food",
"itemName": "egg"
},
{
"itemType": "clothes",
"itemName": "T-shirt"
}
]
}
}
]
}
}
优劣势
- 劣势:item对象中字段之间的映射关系丢弃了(官方文档有说明:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/nested.html),通过两个字段检索数据得到的结果是"or"的关系,例如:案例中错误的检索结果:"orderItems.itemType":"food" 或者 "orderItems.itemType":"food"
- 优势:底层实现简单,而且针对一对一场景,上述缺点不存在,即:案例中可以得到正确的结果 "orderItems.itemType":"food" 并且 "orderItems.itemType":"food"
tips:如果数据库中order与orderItems存储关系是一对一,那么使用嵌套对象没有任何问题。如果是一对多则检索会有问题,则只能选择下面的方案
数据如何扁平化处理的?
javascript
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
嵌套文档
https://www.elastic.co/guide/en/elasticsearch/reference/8.8/nested.html
创建索引
javascript
curl --location --request PUT 'https://myhost/order_index' \
--header 'Content-Type: application/json' \
--data '{
"mappings": {
"dynamic": "false",
"properties": {
"orderId": {
"type": "keyword"
},
"orderItems" : {
"type": "nested",
"properties": {
"itemType": {
"type": "keyword"
},
"itemName": {
"type": "keyword"
}
}
}
}
}
}'
写数据
同嵌套对象
读数据
javascript
curl --location --request GET 'https://myhost/order_index/_search' \
--header 'Content-Type: application/json' \
--data '{
"query": {
"nested": {
"path": "orderItems",
"query": {
"bool": {
"must": [
{
"match": {
"orderItems.itemType": "food"
}
},
{
"match": {
"orderItems.itemName": "egg"
}
}
]
}
}
}
}
}'
查询结果
javascript
{
"took": 580,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.3862942,
"hits": [
{
"_index": "order_index",
"_type": "_doc",
"_id": "1",
"_score": 1.3862942,
"_source": {
"orderId": "1_1",
"orderItems": [
{
"itemType": "food",
"itemName": "egg"
},
{
"itemType": "clothes",
"itemName": "T-shirt"
}
]
}
}
]
}
}
优劣势
优势:可以解决嵌套对象的局限性,并且数据更精简,文档是独立存储的,不会冗余存储多份
劣势:无法支持多级嵌套,例如:一个问题对应子级答案,答案对应子级投票(ES官方不建议使用该功能)。无法支持一个父多个子场景,例如:一个问题对应答案+评论
例如如果存储多层级document会报错
javascript
curl --location --request PUT 'https://myhost/order_index/_doc/3' \
--header 'Content-Type: application/json' \
--data '{
"orderId": "3_3",
"orderItems": [{
"itemType": {
"myType": 1
},
"itemName": "egg"
}
]
}'
报错
javascript
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse field [orderItems.itemType] of type [keyword] in document with id '3'. Preview of field's value: '{myType=1}'"}],"type":"mapper_parsing_exception","reason":"failed to parse field [orderItems.itemType] of type [keyword] in document with id '3'. Preview of field's value: '{myType=1}'","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 4:21"}},"status":400}
数据如何存储的?
官方文档中有描述,嵌套文档是独立的document,因此上述案例中实际底层存储了6个document,使用cat名称可以看到实际也是如此
Join
官方文档案例很详细,不再搬运了,详情参考:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/parent-join.html
优劣势
优势:可以支持一个父document对应多个子document(数据结构不一样的子文档),也可以支持多级关系(官方不推荐使用)
劣势:join实现成本高,需要花费更高的服务资源,cpu,内存
Ingest Pipeline
成本较高,需要增加Ingest节点,本文不再关注,详情参数官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/json-processor.html
总结
- properties嵌套对象,基本满足的大部分需求,并且实现简单
- properties嵌套索引,查询用法会有些差异,如果子properties对于数组json的检索场景要求对象字段之间匹配关系是与运算关系(and关系),则需要使用该类型
- 子properties的字段可以动态添加,如果dynamic=false新增的字段不能用于检索,新增字段想要用于检索需要reindex,如果dynamic=true新增的字段可以用于检索,例如:下面案例,新增itemId字段可以直接用检索,dynamic=true时。
javascript
curl --location --request PUT 'https://myhost/order_index/_doc/3' \
--header 'Content-Type: application/json' \
--data '{
"orderId": "2_1",
"orderItems": [{
"itemType": "food",
"itemName": "pork",
"itemId": 2
}]
}'
注意,子对象如果没有指定dynamic将继承父的设置,详情可以参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/dynamic.html