ES索引Json格式字段设计

properties

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/properties.html

嵌套对象

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/object.html

创建索引

javascript 复制代码
curl --location --request PUT 'https://myhost/order_index' \
--header 'Content-Type: application/json' \
--data '{
    "mappings": {
        "dynamic": "false",
        "properties": {
            "orderId": {
                "type": "keyword"
            },
            "orderItems" : {
            	"properties": {
            		"itemType": {
            			"type": "keyword"
            		},
            		"itemName": {
            			"type": "keyword"
            		}
            	}
            }
        }
    }
}'

写数据

javascript 复制代码
curl --location --request PUT 'https://myhost/order_index/_doc/1' \
--header 'Content-Type: application/json' \
--data '{
    "orderId": "1_1",
    "orderItems": [{
        "itemType": "food",
        "itemName": "egg"
    },
    {
        "itemType": "clothes",
        "itemName": "T-shirt"
    }
    ]
}'

curl --location --request PUT 'https://myhost/order_index/_doc/2' \
--header 'Content-Type: application/json' \
--data '{
    "orderId": "2_2",
    "orderItems": [{
        "itemType": "food",
        "itemName": "pork"
    },
    {
        "itemType": "poultryEggs",
        "itemName": "egg"
    }
    ]
}'

读数据

javascript 复制代码
curl --location --request GET 'https://myhost/order_index/_search' \
--header 'Content-Type: application/json' \
--data '{
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "orderItems.itemType":"food"
                    }
                },
                {
                    "match":{
                        "orderItems.itemName":"egg"
                    }
                }
            ]
        }
    }
}'

查询结果

javascript 复制代码
{
    "took": 763,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.723315,
        "hits": [
            {
                "_index": "order_index",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.723315,
                "_source": {
                    "orderId": "2_2",
                    "orderItems": [
                        {
                            "itemType": "food",
                            "itemName": "pork"
                        },
                        {
                            "itemType": "poultryEggs",
                            "itemName": "egg"
                        }
                    ]
                }
            },
            {
                "_index": "order_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.723315,
                "_source": {
                    "orderId": "1_1",
                    "orderItems": [
                        {
                            "itemType": "food",
                            "itemName": "egg"
                        },
                        {
                            "itemType": "clothes",
                            "itemName": "T-shirt"
                        }
                    ]
                }
            }
        ]
    }
}

优劣势

  1. 劣势:item对象中字段之间的映射关系丢弃了(官方文档有说明:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/nested.html),通过两个字段检索数据得到的结果是"or"的关系,例如:案例中错误的检索结果:"orderItems.itemType":"food" 或者 "orderItems.itemType":"food"
  2. 优势:底层实现简单,而且针对一对一场景,上述缺点不存在,即:案例中可以得到正确的结果 "orderItems.itemType":"food" 并且 "orderItems.itemType":"food"

tips:如果数据库中order与orderItems存储关系是一对一,那么使用嵌套对象没有任何问题。如果是一对多则检索会有问题,则只能选择下面的方案

数据如何扁平化处理的?

javascript 复制代码
{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}

嵌套文档

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/nested.html

创建索引

javascript 复制代码
curl --location --request PUT 'https://myhost/order_index' \
--header 'Content-Type: application/json' \
--data '{
    "mappings": {
        "dynamic": "false",
        "properties": {
            "orderId": {
                "type": "keyword"
            },
            "orderItems" : {
            	"type": "nested",
            	"properties": {
            		"itemType": {
            			"type": "keyword"
            		},
            		"itemName": {
            			"type": "keyword"
            		}
            	}
            }
        }
    }
}'

写数据

同嵌套对象

读数据

javascript 复制代码
curl --location --request GET 'https://myhost/order_index/_search' \
--header 'Content-Type: application/json' \
--data '{
    "query": {
        "nested": {
            "path": "orderItems",
            "query": {
                "bool": {
                    "must": [
                        {
                            "match": {
                                "orderItems.itemType": "food"
                            }
                        },
                        {
                            "match": {
                                "orderItems.itemName": "egg"
                            }
                        }
                    ]
                }
            }
        }
    }
}'

查询结果

javascript 复制代码
{
    "took": 580,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.3862942,
        "hits": [
            {
                "_index": "order_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.3862942,
                "_source": {
                    "orderId": "1_1",
                    "orderItems": [
                        {
                            "itemType": "food",
                            "itemName": "egg"
                        },
                        {
                            "itemType": "clothes",
                            "itemName": "T-shirt"
                        }
                    ]
                }
            }
        ]
    }
}

优劣势

优势:可以解决嵌套对象的局限性,并且数据更精简,文档是独立存储的,不会冗余存储多份

劣势:无法支持多级嵌套,例如:一个问题对应子级答案,答案对应子级投票(ES官方不建议使用该功能)。无法支持一个父多个子场景,例如:一个问题对应答案+评论

例如如果存储多层级document会报错

javascript 复制代码
curl --location --request PUT 'https://myhost/order_index/_doc/3' \
--header 'Content-Type: application/json' \
--data '{
    "orderId": "3_3",
    "orderItems": [{
        "itemType": {
        	"myType": 1
        },
        "itemName": "egg"
    }
    ]
}'

报错

javascript 复制代码
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse field [orderItems.itemType] of type [keyword] in document with id '3'. Preview of field's value: '{myType=1}'"}],"type":"mapper_parsing_exception","reason":"failed to parse field [orderItems.itemType] of type [keyword] in document with id '3'. Preview of field's value: '{myType=1}'","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 4:21"}},"status":400}

数据如何存储的?

官方文档中有描述,嵌套文档是独立的document,因此上述案例中实际底层存储了6个document,使用cat名称可以看到实际也是如此

Join

官方文档案例很详细,不再搬运了,详情参考:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/parent-join.html

优劣势

优势:可以支持一个父document对应多个子document(数据结构不一样的子文档),也可以支持多级关系(官方不推荐使用)

劣势:join实现成本高,需要花费更高的服务资源,cpu,内存

Ingest Pipeline

成本较高,需要增加Ingest节点,本文不再关注,详情参数官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/json-processor.html

总结

  1. properties嵌套对象,基本满足的大部分需求,并且实现简单
  2. properties嵌套索引,查询用法会有些差异,如果子properties对于数组json的检索场景要求对象字段之间匹配关系是与运算关系(and关系),则需要使用该类型
  3. 子properties的字段可以动态添加,如果dynamic=false新增的字段不能用于检索,新增字段想要用于检索需要reindex,如果dynamic=true新增的字段可以用于检索,例如:下面案例,新增itemId字段可以直接用检索,dynamic=true时。
javascript 复制代码
curl --location --request PUT 'https://myhost/order_index/_doc/3' \
--header 'Content-Type: application/json' \
--data '{
    "orderId": "2_1",
    "orderItems": [{
        "itemType": "food",
        "itemName": "pork",
        "itemId": 2
    }]
}'

注意,子对象如果没有指定dynamic将继承父的设置,详情可以参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/dynamic.html

相关推荐
SelectDB技术团队5 小时前
金融场景 PB 级大规模日志平台:中信银行信用卡中心从 Elasticsearch 到 Apache Doris 的先进实践
大数据·elasticsearch·金融·doris·日志分析
橘子在努力6 小时前
【橘子ES】使用docker搭建ELK环境
elk·elasticsearch·docker
做咩啊~6 小时前
部署Metricbeat监测ES
elasticsearch
数据馅6 小时前
python自动生成pg数据库表对应的es索引
数据库·python·elasticsearch
cr72586 小时前
MCP Server 开发实战:无缝对接 LLM 和 Elasticsearch
大数据·elasticsearch·搜索引擎
codeBrute6 小时前
Elasticsearch的经典面试题及详细解答
大数据·elasticsearch·搜索引擎
risc1234567 小时前
【Elasticsearch】_reindex api请求
elasticsearch
chengpei1479 小时前
chrome游览器JSON Formatter插件无效问题排查,FastJsonHttpMessageConverter导致Content-Type返回不正确
java·前端·chrome·spring boot·json
zfj32110 小时前
学技术学英文:elasticsearch 的数据类型
elasticsearch·数据类型·复杂数据类型
xiao-xiang14 小时前
jenkins-k8s pod方式动态生成slave节点
java·kubernetes·jenkins