Elasticsearch 分词器安装(IK+拼音)
1、文档说明
- 适用环境:ES7.17.23 + CentOS7.4
- 包含插件:IK中文分词、Pinyin拼音分词
- 运行要求:所有命令只能在Linux终端执行,禁止浏览器打开ES链接,会提示URL报错
2、硬性规则(必看、防止报错)
- 插件版本 必须和ES版本完全一致:7.17.23
- ES禁止root权限运行插件,安装后必须授权
- 拼音分词必须手动在 yml 配置自定义分词器名称
- 所有
127.0.0.1:9200链接不要用浏览器访问,会URL解析失败
3、分词器完整安装步骤
3.1 进入ES插件目录
bash
cd /usr/share/elasticsearch/plugins
3.2 安装 IK 中文分词器
github仓库地址 https://release.infinilabs.com/
bash
# 下载对应版本IK
wget https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-7.17.23.zip
# 解压到ik文件夹
unzip elasticsearch-analysis-ik-7.17.23.zip -d ik
# 授权
chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/plugins/ik
3.3 安装 Pinyin 拼音分词器
bash
# 下载拼音插件
wget https://release.infinilabs.com/analysis-pinyin/stable/elasticsearch-analysis-pinyin-7.17.23.zip
# 解压到pinyin文件夹
unzip elasticsearch-analysis-pinyin-7.17.23.zip -d pinyin
# 授权
chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/plugins/pinyin
3.4 ES7.x 硬性禁忌(必看)
- 禁止 在
elasticsearch.yml写入index.analysis.*,直接启动失败。 - 已开启的索引不能动态后加分词器,必须删索引重建。
- 自定义分词器不能裸用 /_analyze,必须绑定具体索引测试。
- 索引模板只对新建索引生效,旧索引不自动继承。
3.5 重启ES
bash
supervisorctl restart elasticsearch
3.6 检查插件是否安装成功
bash
curl http://127.0.0.1:9200/_cat/plugins?v
成功返回示例:
json
name component version
node-1 analysis-ik 7.17.23
node-1 analysis-pinyin 7.17.23
4. 原生分词器测试(无需自定义)
4.1 IK中文分词
4.1.1 ik_max_word(细粒度拆分)
bash
curl -XPOST http://127.0.0.1:9200/_analyze \
-H "Content-Type: application/json" \
-d '{
"analyzer": "ik_max_word",
"text": "CentOS7安装ES分词器"
}'
4.1.2 ik_smart(粗粒度拆分)
bash
curl -XPOST http://127.0.0.1:9200/_analyze \
-H "Content-Type: application/json" \
-d '{
"analyzer": "ik_smart",
"text": "CentOS7安装ES分词器"
}'
4.2 原生拼音分词
bash
curl -XPOST http://127.0.0.1:9200/_analyze \
-H "Content-Type: application/json" \
-d '{
"analyzer": "pinyin",
"text": "程序员"
}'
5. 自定义分词器(全拼 + 首字母)
5.1 规则说明
- 自定义分词器必须写在索引
settings内 - 已创建索引无法动态添加,必须删除重建
5.2 索引常用命令
5.2.1 查看所有索引
bash
curl http://127.0.0.1:9200/_cat/indices?v
5.2.2 删除单个索引
bash
curl -XDELETE http://127.0.0.1:9200/索引名
5.3 创建测试索引(绑定自定义分词)
bash
curl -XPUT http://127.0.0.1:9200/test_index \
-H "Content-Type: application/json" \
-d '{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"type": "pinyin",
"keep_full_pinyin": true,
"keep_joined_full_pinyin": true,
"keep_none_chinese": true,
"lowercase": true,
"keep_original": false
},
"pinyin_first_letter": {
"type": "pinyin",
"keep_first_letter": true,
"keep_full_pinyin": false,
"keep_joined_full_pinyin": false
}
}
}
}
}'
5.4 测试自定义分词器
5.4.1 全拼分词
bash
curl -XPOST http://127.0.0.1:9200/test_index/_analyze \
-H "Content-Type: application/json" \
-d '{
"analyzer": "pinyin_analyzer",
"text": "程序员"
}'
5.4.2 首字母分词
bash
curl -XPOST http://127.0.0.1:9200/test_index/_analyze \
-H "Content-Type: application/json" \
-d '{
"analyzer": "pinyin_first_letter",
"text": "程序员"
}'
6. 全局索引模板(永久通用)
6.1 模板作用
所有新建索引自动内置:IK、全拼、首字母分词器,无需重复配置。
6.2 创建全局模板
bash
curl -XPUT http://127.0.0.1:9200/_template/global_ik_pinyin_template \
-H "Content-Type: application/json" \
-d '{
"index_patterns": ["*"],
"order": 10,
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"ik_max_word": {"type":"ik_max_word"},
"ik_smart": {"type":"ik_smart"},
"pinyin_analyzer": {
"type":"pinyin",
"keep_full_pinyin":true,
"keep_joined_full_pinyin":true,
"lowercase":true
},
"pinyin_first_letter": {
"type":"pinyin",
"keep_first_letter":true,
"keep_full_pinyin":false
}
}
}
}
}'
7. 业务正式Mapping示例
适用文章、帖子、商品等中文+拼音搜索场景
bash
curl -XPUT http://127.0.0.1:9200/article \
-H "Content-Type: application/json" \
-d '{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"pinyin": {
"type": "text",
"analyzer": "pinyin_analyzer"
},
"first_char": {
"type": "text",
"analyzer": "pinyin_first_letter"
}
}
}
}
}
}'
8. 常见报错汇总
| 报错现象 | 原因 | 解决方案 |
|---|---|---|
| ES启动失败 | yml写入index.analysis | 删除yml中所有index.analysis配置 |
| Can't update non dynamic settings | 已存在索引动态加分词 | 删除索引,重建时写入分词配置 |
| failed to find global analyzer | 裸用自定义分词器 | 请求带上索引名 /xxx/_analyze |
| Connection refused下载失败 | 服务器无法访问Github | 使用离线安装方式 |
9. 附录:ES只读锁处理(附加)
9.1 批量解锁所有索引
bash
curl -XPUT -H "Content-Type: application/json" http://127.0.0.1:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'