文章目录
- 分词器
-
- 1.字符过滤器
- 2.自定义分词器
- 3.中文分词器
-
-
- 1.下载ik分词器7.10.0版本(跟es对应)
- 2.应用ik分词器
-
- 1.进入插件目录下创建一个ik目录
- 2.将下载的压缩包上传上去
- 3.安装unzip并解压
- 4.重启es
-
- [1.首先查找到es,然后kill -9](#1.首先查找到es,然后kill -9)
- 2.重新启动三个节点
- 3.使用head插件查看状态
- 3.测试ik分词器是否生效
- 4.ik分词器的配置文件
-
- 索引的批量操作
- 模糊搜索和智能搜索推荐
分词器
1.字符过滤器
1.介绍
data:image/s3,"s3://crabby-images/48912/48912fbc571ad3cd66ca077524623ae1c2585554" alt=""
2.过滤html标签
data:image/s3,"s3://crabby-images/760e1/760e1a56bfecba8141ef9558e4cf6206e5ffe106" alt=""
3.mappings过滤规则(屏蔽非文明用语)
data:image/s3,"s3://crabby-images/f9b10/f9b1039f86ce8819bf2e96114be1832ec868ac2a" alt=""
4.正则替换
data:image/s3,"s3://crabby-images/da6d7/da6d707850d65bcfb2fd7a25681128610ddcb093" alt=""
2.自定义分词器
1.代码
java
PUT /test_index
{
"settings": {
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "html_strip" // 自定义字符过滤器,使用 html_strip 类型,剥离 HTML 标签
}
},
"filter": {
"my_stopword": {
"type": "stop", // 自定义过滤器,移除停用词
"stopwords": ["is", "a"] // 指定停用词列表,包括 "is" 和 "a"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern", // 自定义分词器,基于正则表达式进行分词
"pattern": "[,\\.?\\s]+" // 正则表达式,将逗号、句号、问号和空格作为分隔符
}
},
"analyzer": {
"my_analysis": {
"type": "custom", // 自定义分析器类型
"char_filter": ["my_char_filter"], // 使用自定义字符过滤器 "my_char_filter"
"tokenizer": "my_tokenizer", // 使用自定义分词器 "my_tokenizer"
"filter": ["my_stopword"] // 使用自定义停用词过滤器 "my_stopword"
}
}
}
}
}
2.查询
java
GET /test_index/_analyze
{
"analyzer": "my_analysis",
"text": "<p>This is a test, isn't it amazing?</p>"
}
data:image/s3,"s3://crabby-images/58820/58820ffd9c3dad0cdbf84806825885432a82f81a" alt=""
3.中文分词器
1.下载ik分词器7.10.0版本(跟es对应)
https://release.infinilabs.com/analysis-ik/stable/
data:image/s3,"s3://crabby-images/b02b1/b02b1472b65fb82c2ee05c8f3d5748f010b80321" alt=""
data:image/s3,"s3://crabby-images/7b25d/7b25d4aa881ac3096d3461c09d5b882c48953c62" alt=""
2.应用ik分词器
1.进入插件目录下创建一个ik目录
sh
cd /usr/local/ElasticSearch/elasticsearch-7.10.0/plugins && mkdir ik
2.将下载的压缩包上传上去
sh
cd ik
data:image/s3,"s3://crabby-images/79525/7952590ed6c3bbd3f2a684f933470852603c3e16" alt=""
3.安装unzip并解压
1.安装
sh
yum install unzip
2.解压
sh
unzip elasticsearch-analysis-ik-7.10.0.zip
3.删除zip
sh
rm -f elasticsearch-analysis-ik-7.10.0.zip
4.将这个ik目录分别复制给另外三台节点
sh
cp -r /usr/local/ElasticSearch/elasticsearch-7.10.0/plugins/ik /usr/local/ElasticSearch/node2/plugins && cp -r /usr/local/ElasticSearch/elasticsearch-7.10.0/plugins/ik /usr/local/ElasticSearch/node3/plugins
4.重启es
1.首先查找到es,然后kill -9
sh
ps -aux | grep elasticsearch
2.重新启动三个节点
sh
su elasticsearch
sh
cd /usr/local/ElasticSearch/elasticsearch-7.10.0/bin/ && ./elasticsearch -d && cd /usr/local/ElasticSearch/node2/bin/ && ./elasticsearch -d && cd /usr/local/ElasticSearch/node3/bin/ && ./elasticsearch -d
3.使用head插件查看状态
data:image/s3,"s3://crabby-images/57d49/57d49d242d16036911e0be2cb88101291e55d463" alt=""
3.测试ik分词器是否生效
1.创建索引并应用ik分词器
data:image/s3,"s3://crabby-images/b8922/b8922f43e9184c6894f09efd2b9740d30b672821" alt=""
java
PUT /test_index
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_smart"
},
"description": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
2.插入数据
java
POST /test_index/_doc
{
"title": "Elasticsearch 是一个分布式搜索引擎",
"description": "IK 分词器支持中文分词,并且可以用于全文检索"
}
3.查看分词结果
java
POST /test_index/_analyze
{
"field": "title",
"text": "Elasticsearch 是一个分布式搜索引擎"
}
data:image/s3,"s3://crabby-images/66130/6613002c6c003c1fa863f65c2634426759c168f1" alt=""
java
POST /test_index/_analyze
{
"field": "description",
"text": "IK 分词器支持中文分词,并且可以用于全文检索"
}
data:image/s3,"s3://crabby-images/b8513/b8513fe83715aa0ef32b1b7f94d652d44f5644fe" alt=""
4.ik分词器的配置文件
data:image/s3,"s3://crabby-images/1d302/1d302a2fbf6e356c71d0bc4b8da885f8ba5fa2f3" alt=""
索引的批量操作
1.基于mget的批量查询
1.环境搭建
java
PUT /new_index
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
}
}
}
}
java
POST /new_index/_doc/1
{
"name": "Alice",
"age": 25
}
POST /new_index/_doc/2
{
"name": "Bob",
"age": 30
}
POST /new_index/_doc/3
{
"name": "Charlie",
"age": 22
}
2.根据id进行批量查询
java
GET /new_index/_mget
{
"ids": [1,2]
}
data:image/s3,"s3://crabby-images/85eef/85eefe3289663c60acc7ce76ecb44819bbd65fee" alt=""
3.还可以在批量查询时指定自己想要的字段
java
GET /new_index/_mget
{
"docs": [
{
"_id": 1,
"_source": {
"include": [
"name"
]
}
},
{
"_id": 2,
"_source": {
"exclude": [
"name"
]
}
}
]
}
data:image/s3,"s3://crabby-images/bd1d9/bd1d95b4890da27510860d4c289b8be479826633" alt=""
2.文档的四种操作类型
1.介绍
data:image/s3,"s3://crabby-images/a8aec/a8aec9270b05a14714de3d5b4583b3138dc34b7e" alt=""
2._create 创建数据
data:image/s3,"s3://crabby-images/6124f/6124f97e51ff0e1f95f74065e4ff2f84b210087a" alt=""
3.删除数据
data:image/s3,"s3://crabby-images/fa7fb/fa7fb394f7d0fb927c50b05a1666a0776e95b3c2" alt=""
4._search 查询数据
data:image/s3,"s3://crabby-images/ccc37/ccc371bd0aef9fbe0ce864dbdf47feebfb1196e0" alt=""
5._update 更新数据
data:image/s3,"s3://crabby-images/5a3b4/5a3b40d583bbddddf179076109c1988531c5a354" alt=""
3._bulk
data:image/s3,"s3://crabby-images/08492/084927e953b4bb1fce05a21ffc9ca3f093494681" alt=""
模糊搜索和智能搜索推荐
1.基本介绍
data:image/s3,"s3://crabby-images/3858f/3858f9ca38a494e92649485967ced1198dd9f9f8" alt=""
2.前缀搜索
1.基础数据
java
#prefix: 前缀搜索
POST /my_index/_bulk?filter_path=items.*.error
{ "index": { "_id": "1" } }
{ "text": "城管打电话喊商贩去摆摊摊" }
{ "index": { "_id": "2" } }
{ "text": "笑果文化回应老农去摆摊" }
{ "index": { "_id": "3" } }
{ "text": "老农耗时17年种出椅子树" }
{ "index": { "_id": "4" } }
{ "text": "夫妻结婚30多年AA制,被城管抓" }
{ "index": { "_id": "5" } }
{ "text": "黑人见义勇为阻止抢劫反被铐住" }
2.代码
java
{
"query": {
"prefix": {
"text": {
"value": "笑"
}
}
}
}
3.原理
前缀匹配是对倒排索引进行匹配的,而不是整个字段
3.通配符搜索
1.基本介绍
通配符也是匹配的倒排索引
2.代码
data:image/s3,"s3://crabby-images/09808/098089c2ad32d2945849df7ef501ded40520368e" alt=""
4.正则匹配
data:image/s3,"s3://crabby-images/4991d/4991d8ae56fe123cc8a26d75a2d24ff7ee1fda0c" alt=""
5.模糊查询
1.介绍
data:image/s3,"s3://crabby-images/7a413/7a413a068894f5fe6c02bbca2e7ac2ba24faa8d6" alt=""
2.代码
注意:fuzzy是不分词的,match是分词的
data:image/s3,"s3://crabby-images/8bc52/8bc529b0ec63616e753e108fc2578feff9e1ac6f" alt=""
6.短语前缀
1.介绍
data:image/s3,"s3://crabby-images/d7a7a/d7a7a3674402159c66c9c0efca4b0396c725bd04" alt=""
2.代码
data:image/s3,"s3://crabby-images/5fadd/5fadd9a475da603cc34cdea2ec78a7ae07480753" alt=""
7.edge_ngram
java
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_edge_ngram": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5
}
},
"analyzer": {
"my_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_edge_ngram"
]
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_ngram_analyzer",
"search_analyzer": "standard"
}
}
}
}