一、概述
Elastic normalizer是Elasticsearch中用于处理keyword类型字段的一种工具,主要用于对字段进行规范化处理,确保在索引和查询时保持一致性。
Normalizer与analyzer类似,都是对字段进行处理,但normalizer不会对字段进行分词,即没有tokenizer。它主要用于keyword类型的字段(不能再其他字段设置normalizer),可以在索引和查询时对字段值进行额外的处理,如转换为小写。例如,可以使用normalizer将字段值转换为小写,这在处理大小写不敏感的查询时非常有用。
二、normalizer的属性
normalizer仅仅有 char filters和token filters,具有的filter为:arabic_normalization, asciifolding, bengali_normalization, cjk_width, decimal_digit, elision, german_normalization, hindi_normalization, indic_normalization, lowercase, pattern_replace, persian_normalization, scandinavian_folding, serbian_normalization, sorani_normalization, trim, uppercase.
其中lowercase为Elasticsearch内置filter,其他的filter需要自定义配置。
自定义的chat filter和filter:
PUT index
{
"settings": {
"analysis": {
"char_filter": {
"quote": {
"type": "mapping",
"mappings": [
"<< => \"",
">> => \""
]
}
},
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": ["quote"],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"properties": {
"foo": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
三、验证只有keyword类型可以设置normalizer
创建如下mapping,并将类型为text的name字段设置上normalizer
PUT test_index
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
},
"title": {
"type": "text",
"analyzer": "standard",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
},
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": ["lowercase"],
"char_filter": []
}
},
"analyzer": {
"my_analyzer": {
"filter": ["lowercase"],
"tokenizer": "standard"
}
}
}
}
}
提示如下错误信息: