Elasticsearch（五）---映射

映射

复制代码

# curl -XPUT node3:9200/books?pretty
{
  "acknowledged" : true
}

# curl node3:9200/books/_mapping?pretty
{
  "books" : {
    "mappings" : { }
  }
}

# curl -XPOST node3:9200/books/it/1?pretty -d '{
"id":1,
"publish_date":"2017-06-01",
"name":"master Elasticsearch"
}'
{
  "_index" : "books",
  "_type" : "it",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "created" : true
}


# curl node3:9200/books/_mapping?pretty
{
  "books" : {
    "mappings" : {
      "it" : {
        "properties" : {
          "id" : {
            "type" : "long"
          },
          "name" : {
            "type" : "string"
          },
          "publish_date" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          }
        }
      }
    }
  }
}

如果将ES当作主要的数据存储使用，并希望出现未知字段时抛出异常来提醒你注意这一问题，那么开启动态Mapping并不适用。在Mapping中可以通过dynamic设置来控制是否自动新增字段，接受以下参数：

true 默认值为true，自动添加字段
false 忽略新的字段
strict 严格模式，发现新的字段抛出异常

curl -XDELETE node3:9200/books?pretty

{
"acknowledged" : true
}

curl -XPOST node3:9200/books?pretty -d '{

"mappings": {
"it":{
"dynamic":"strict",
"properties": {
"title":{
"type":"string"
},
"publish_date":{
"type":"date"
}
}
}
}
}'

curl node3:9200/books/_mapping?pretty

{
"books" : {
"mappings" : {
"it" : {
"dynamic" : "strict",
"properties" : {
"publish_date" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"title" : {
"type" : "string"
}
}
}
}
}
}

curl -XPOST node2:9200/books/it/1?pretty -d '{

"title":"master Elasticsearch",
"publish_date":"2017-06-01"
}'
{
"_index" : "books",
"_type" : "it",
"_id" : "1",
"_version" : 3,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"created" : true
}

curl -XPOST node2:9200/books/it/2?pretty -d '{

"title":"master Elasticsearch"
}'
{
"_index" : "books",
"_type" : "it",
"_id" : "2",
"_version" : 1,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"created" : true
}

curl -XPOST node2:9200/books/it/3?pretty -d '{

"title":"master Elasticsearch",
"publish_date":"2017-06-01",
"author":"Tom"
}'
{
"error" : {
"root_cause" : [ {
"type" : "strict_dynamic_mapping_exception",
"reason" : "mapping set to strict, dynamic introduction of [author] within [it] is not allowed"
} ],
"type" : "strict_dynamic_mapping_exception",
"reason" : "mapping set to strict, dynamic introduction of [author] within [it] is not allowed"
},
"status" : 400
}

当ES遇到一个新的字符串类型的字段的时候，它会检查这个字符串是否包含一个可识别的日期。如果看起来像日期，比如：2017-09-12，它会识别为一个date类型的字段，否则会将它作为string字符串添加。这样有一些问题。比如：

{"note":"2017-09-12"}

第一次识别为日期，但是如果下一条记录为：

{"note":"Logged out"}

就会导致一个异常。可以通过在根对象上将date_detection设置为false来关闭日期检测：

复制代码

# curl -XPOST node2:9200/my_index?pretty
{
  "acknowledged" : true
}


# curl node2:9200/my_index/_mapping?pretty
{
  "my_index" : {
    "mappings" : { }
  }
}

# curl -XPOST node2:9200/my_index1?pretty -d'{
"mappings":{
 "my_type":{
   "date_detection":false
}
 }
}'
{
  "acknowledged" : true
}

# curl node2:9200/my_index1/_mapping?pretty
{
  "my_index1" : {
    "mappings" : {
      "my_type" : {
        "date_detection" : false
      }
    }
  }
}

静态映射

在创建索引时手工指定索引映射，类似于建表时在SQL中指定字段属性。

静态映射更详细、更精准。

复制代码

# curl -XPOST node2:9200/my_index?pretty -d '{
  "mappings":{
    "user":{
      "_all":{"enabled":false},
      "properties":{
        "title":{"type":"string"},
        "name":{"type":"string"}, 
        "age":{"type":"integer"}
      }
    },
    "blogpost":{
      "_all":{"enabled":false}, 
      "properties":{
        "title":{"type":"string"},
        "body":{"type":"string"},
        "user_id":{"type":"string"},
        "created":{
          "type":"date",
          "format":"strict_date_optional_time||epoch_millis"
        }
      }
    }
  }
}'
{
  "acknowledged" : true
}

# curl node2:9200/my_index/_mapping?pretty
{
  "my_index" : {
    "mappings" : {
      "blogpost" : {
        "_all" : {
          "enabled" : false
        },
        "properties" : {
          "body" : {
            "type" : "string"
          },
          "created" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "title" : {
            "type" : "string"
          },
          "user_id" : {
            "type" : "string"
          }
        }
      },
      "user" : {
        "_all" : {
          "enabled" : false
        },
        "properties" : {
          "age" : {
            "type" : "integer"
          },
          "name" : {
            "type" : "string"
          },
          "title" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

字段的类型

|-------------------|--------------------------------------------------|
| JSON格式的数据 | 自动推测的字段类型 |
| null | 不添加字段 |
| true或false | boolean类型 |
| 浮点类型数字 | float |
| 数字 | long |
| JSON对象 | object类型 |
| 数组 | 由数组中第一个非空值决定 |
| string | 可能是date类型（开启日期检测）、double或long类型、text类型、keyword类型 |

ES字段类型包括核心类型、复合类型、地理类型和特殊类型。

|------|--------|--------------------------------------------------------------|
| 一级分类 | 二级分类 | 具体类型 |
| 核心类型 | 字符串类型 | string、text、keyword |
| 核心类型 | 数字类型 | long、integer、short、byte、double、float、half_float、scaled_float |
| 核心类型 | 日期类型 | date |
| 核心类型 | 布尔类型 | boolean |
| 核心类型 | 二进制类型 | binary |
| 核心类型 | 范围类型 | range |
| 复合类型 | 数组类型 | array |
| 复合类型 | 对象类型 | object |
| 复合类型 | 嵌套类型 | nested |
| 地理类型 | 地理坐标 | geo_point |
| 地理类型 | 地理图形 | geo_shape |
| 特殊类型 | IP类型 | ip |
| 特殊类型 | 范围类型 | completion |
| 特殊类型 | 令牌计数类型 | token_count |
| 特殊类型 | 附件类型 | attachment |
| 特殊类型 | 抽取类型 | percolator |

A.string

ES 5.X之后字段类型不再支持，由text或keyword取代。

B.text

如果一个字段要被全文搜索，应该使用此类型。设置该类型后，字段内容会被分析，在生成倒排索引之前，字符串会被分词器分成一个一个词项。text类型字段不用于排序。

复制代码

put my_index
{
  "mappings": {
    "my_type": {
  "properties": {
    "full_name": {
       "type": "text"
      }
    }
 }
}
}

C.keyword

适用于索引结构化的字段，如email地址、主机名、状态码和标签。通常用于过滤、排序、聚合。该类型的字段只能通过精确值搜索到，区别于text。

D.数字类型

|---------|----------------|--------------|--------------------|
| 类型 | 取值范围 | 类型 | 取值范围 |
| long | -2^63到2^63-1 | double | 64位双精度IEEE 754浮点类型 |
| integer | -2^31到2^31-1 | float | 32位单精度IEEE 754浮点类型 |
| short | -32768到32767 | half_float | 16位单精度IEEE 754浮点类型 |
| byte | -128到127 | scaled_float | 缩放类型的浮点数 |

处理浮点数时，优先考虑使用scaled_float类型。scaled_float通过缩放因子把浮点数变成long类型的。比如精确到分的价格，设置放大因子为100，存储的就是整数了。所有的API都会把价格当作浮点数，ES底层存储的是整数类型，因为压缩整数比压缩浮点数更加节省存储空间。

复制代码

put my_index
{
  "mappings": {
  "my_type": {
  "properties": {
  "number_of_bytes": {"type": "integer"}
  "time_in_seconds": {"type": "float"}
  "price": {
  "type": "scaled_float",
  "scaling_factor": 100
}
}
}
}
}

E.date

ES中日期可以是以下几种形式：

格式化日期的字符串，如2015-01-01或2015/01/01 12:10:30

毫秒值，从1970年1月1日00:00:00开始算起秒，从1970年

默认"strict_date_optional_time||epoch_millis"

复制代码

put my_index
{
  "mappings": {
  "my_type": {
  "properties": {
  "date": {
  "type": "date"
}
}
}
}
}

put my_index/my_type/1 {"date": "2015-01-01"}
put my_index/my_type/2 {"date": "2015-01-01T12:10:30Z"}
put my_index/my_type/3 {"date": 1420070400001}
上述三种时间都可以识别，ES内部存储的是毫秒计时的长整型数。

ES元字段

|---------------|--------------|-------------------|
| 元字段分类 | 具体属性 | 作用 |
| 文档属性的元字段 | _index | 文档所属索引 |
| 文档属性的元字段 | _uid | 包含_type和_id的符合字段 |
| 文档属性的元字段 | _type | 文档的类型 |
| 文档属性的元字段 | _id | 文档id |
| 原文档的元字段 | _source | 文档的原始JSON字符串 |
| 原文档的元字段 | _size | _source字段的大小 |
| 索引的元字段 | _all | 包含索引全部字段的超级字段 |
| 索引的元字段 | _field_names | 文档中包含非空值的所有字段 |
| 路由的元字段 | _parent | 指定文档间的父子关系 |
| 路由的元字段 | _routing | 将文档路由到特定分片的自定义路由值 |
| 自定义元字段 | _meta | 用于自定义元数据 |

_index

_index支持对索引名进行term查询、terms查询、聚合分析、使用脚本和排序。不支持prefix、wildcard、regexp和fuzzy查询。

复制代码

# curl -XPUT node3:9200/index_1/my_type/1?pretty -d '{
 "text":"Document in index 1"
}'
{
  "_index" : "index_1",
  "_type" : "my_type",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

# curl -XPUT node3:9200/index_2/my_type/2?refresh=true -d '{
"text":"Document in index 2"
}'
{"_index":"index_2","_type":"my_type","_id":"2","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"created":true}

# curl node3:9200/index_1,index_2/_search?pretty -d '{
   "query":{
     "terms":{"_index":["index_1", "index_2"]}
   },
   "aggs":{
     "indices":{
       "terms":{
       "field":"_index",
         "size":10
       }
     }
   },
   "sort":[
     {
       "_index":{
         "order":"asc"
       }
     }
   ]
}'
{
  "took" : 105,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [ {
      "_index" : "index_1",
      "_type" : "my_type",
      "_id" : "1",
      "_score" : null,
      "_source" : {
        "text" : "Document in index 1"
      },
      "sort" : [ "index_1" ]
    }, {
      "_index" : "index_2",
      "_type" : "my_type",
      "_id" : "2",
      "_score" : null,
      "_source" : {
        "text" : "Document in index 2"
      },
      "sort" : [ "index_2" ]
    } ]
  },
  "aggregations" : {
    "indices" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "index_1",
        "doc_count" : 1
      }, {
        "key" : "index_2",
        "doc_count" : 1
      } ]
    }
  }
}

_type

每条被索引的文档都有一个_type和_id字段，可以根据_type进行查询、聚合、脚本和排序。

复制代码

# curl -XPUT node2:9200/my_index/type_1/1?pretty -d '{
 "text":"Document with type 1"
}'
{
  "_index" : "my_index",
  "_type" : "type_1",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "created" : true
}

# curl -XPUT node2:9200/my_index/type_2/2?pretty -d '{
"text":"Document with type 2"
}'
{
  "_index" : "my_index",
  "_type" : "type_2",
  "_id" : "2",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "created" : true
}


# curl node3:9200/my_index/_search?pretty -d '{
   "query":{
     "terms":{
       "_type":["type_1", "type_2"]
     }
   },
  "aggs":{
    "types":{
      "terms":{
        "field":"_type",
        "size":"10"
      }
    }
  },
  "sort":[
    {
      "_type":{
        "order":"desc"
      }
    }
  ]
}'
{
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [ {
      "_index" : "my_index",
      "_type" : "type_2",
      "_id" : "2",
      "_score" : null,
      "_source" : {
        "text" : "Document with type 2"
      },
      "sort" : [ "type_2" ]
    }, {
      "_index" : "my_index",
      "_type" : "type_1",
      "_id" : "1",
      "_score" : null,
      "_source" : {
        "text" : "Document with type 1"
      },
      "sort" : [ "type_1" ]
    } ]
  },
  "aggregations" : {
    "types" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "type_1",
        "doc_count" : 1
      }, {
        "key" : "type_2",
        "doc_count" : 1
      } ]
    }
  }
}

_id

_id可以用于term查询、terms查询、match查询、query_string查询、simple_query_string查询，但是不能用于聚合、脚本和排序。

复制代码

# curl node2:9200/my_index/_search?pretty -d '{
"query":{
"terms":{"_id":["1", "2"]}
}
}'
{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.35355338,
    "hits" : [ {
      "_index" : "my_index",
      "_type" : "type_2",
      "_id" : "2",
      "_score" : 0.35355338,
      "_source" : {
        "text" : "Document with type 2"
      }
    }, {
      "_index" : "my_index",
      "_type" : "type_1",
      "_id" : "1",
      "_score" : 0.35355338,
      "_source" : {
        "text" : "Document with type 1"
      }
    } ]
  }
}

Elasticsearch（五）---映射

映射

curl -XDELETE node3:9200/books?pretty

curl -XPOST node3:9200/books?pretty -d '{

curl node3:9200/books/_mapping?pretty

curl -XPOST node2:9200/books/it/1?pretty -d '{

curl -XPOST node2:9200/books/it/2?pretty -d '{

curl -XPOST node2:9200/books/it/3?pretty -d '{

静态映射

字段的类型

ES元字段

_index

_type

_id