Docker 安装Elasticsearch搜索引擎搜索优化词库挂载拼音分词插件安装

介绍

允许用户快速索引和搜索大量的文本数据。通过使用倒排索引，它能够在海量数据中高效检索相关信息。提供灵活的查询语言，可以做全文搜索、模糊搜索、数据统计等，用来代替MYSQL的模糊搜索，MYSQL的模糊搜索不支持使用索引从而导致搜索性能特别差。

Mysql：擅长事务类型操作，可以确保数据的安全和一致性
Elasticsearch：擅长海量数据的搜索、分析、计算

对安全性要求较高的写操作，使用mysql实现
对查询性能要求较高的搜索需求，使用elasticsearch实现
两者再基于某种方式，实现数据的同步，保证一致性

拉取镜像

powershell 复制代码

 docker pull elasticsearch:7.12.1

运行

powershell 复制代码

docker run -d \
  --name es \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=true" \
  -e "ELASTIC_PASSWORD=qwertyuiop" \
  -v es-data:/usr/share/elasticsearch/data \
  -v es-plugins:/usr/share/elasticsearch/plugins \
  --privileged \
  -p 9200:9200 \
  -p 9300:9300 \
  --restart unless-stopped \
  elasticsearch:7.12.1

-e "discovery.type=single-node": 设置 Elasticsearch 的环境变量，只运行在单节点模式（single-node）。
-e "xpack.security.enabled=true": 启用密码认证
-e "ELASTIC_PASSWORD=qwertyuiop": 设置管理员用户（elastic）的密码为 qwertyuiop。
-v es-data:/usr/share/elasticsearch/data: 挂载的数据卷
-v es-plugins:/usr/share/elasticsearch/plugins: 挂载的插件目录
--privileged: 开启外部访问
-p 9200:9200: 服务端口
-p 9300:9300: 集群内部通信使用的是 9300 端口，用于节点之间的连接和通信
--restart unless-stopped: 设置容器的自动重启策略，开机自动启动容器。

部署成功

访问：http://172.23.4.130:9200/

返回版本信息

javascript 复制代码

{
  "name" : "cef1cc2e9f43",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "rkyxYQ7aTVSvg0hR-GrOtg",
  "version" : {
    "number" : "7.12.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "3186837139b9c6b6d23c3200870651f10d3343b7",
    "build_date" : "2021-04-20T20:56:39.040728659Z",
    "build_snapshot" : false,
    "lucene_version" : "8.8.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

安装分词

索引过程中用来处理文本数据的工具。它将文本分解为"单词"或"术语"来构建索引。

中文分词往往需要根据语义分析，比较复杂，这就需要用到中文分词器。

开源地址：https://github.com/infinilabs/analysis-ik

analysis-ik

中文分词器，它基于 IK Analyzer 实现，专门针对中文、日文等非英语语言的文本分析与分词。IK 分词器广泛应用于 Elasticsearch 中的文本分析，它能够帮助将输入的中文文本拆分为一个个可以索引的单元，从而提升全文搜索的准确性和性能。

安装分词器

查看数据卷

powershell 复制代码

docker volume ls

查看挂载目录

powershell 复制代码

 docker volume inspect es-plugins

powershell 复制代码

mkdir ik

把插件上传

重启容器

powershell 复制代码

 docker restart es

查看是否载入成功

powershell 复制代码

docker logs es

使用分词

请求地址 ：http://172.23.4.130:9200/_analyze
请求方式 ：POST
请求参数

javascript 复制代码

{
    "analyzer":"standard", //分词器类型 默认是standard 中文：ik_smart
    "text":"Docker 安装Elasticsearch搜索引擎 搜索优化 词库挂载 插件安装"
    //分词文本
}

standard不支持中文的分词，所以才使用IK的中文词库

分词效果

自定义词库

IK词库支持自定义词库来满足开发需求，如：我的商城是售卖非遗的一下产品，可以对非遗的一些项目进行录入。

配置文件：/ik/config/IKAnalyzer.cfg.xml

内容如下

xml 复制代码

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">mx.dic</entry>
		<!--在当前目录下创建 -->

	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->

	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

也就是说在同及目录下创建字典就可以直接使用了，指定文件名称即可。

停止词典：有一些词语不需要分词，如一些语气词：嗯啊哦 ...等等语气词。

分词前

javascript 复制代码

{
    "analyzer":"ik_smart",
    "text":"我喜欢梅州客家非遗"
}

返回结构

javascript 复制代码

{
    "tokens": [
    ......
        {
            "token": "客家",
            "start_offset": 5,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "非",
            "start_offset": 7,
            "end_offset": 8,
            "type": "CN_CHAR",
            "position": 4
        },
        {
            "token": "遗",
            "start_offset": 8,
            "end_offset": 9,
            "type": "CN_CHAR",
            "position": 5
        }
    ]
}

这时就可以使用自定义字典

自定义配置

xml 复制代码

<entry key="ext_dict">mx.dic</entry>

mx.dic

javascript 复制代码

非遗
......

分词成功

拼音分词

在一些搜索网站会根据用户输入的拼音来自动进行搜索提示，这里就使用到了拼音的分词器，同样的也有对应的插件处理。要实现根据字母做补全，就必须对文档按照拼音分词。

地址：https://github.com/medcl/elasticsearch-analysis-pinyin

导入的方法也和中文分词器一样，然后重新启动es即可。

使用
请求体

javascript 复制代码

{
    "analyzer":"pinyin",
    "text":"客家"
}

响应

javascript 复制代码

{
    "tokens": [
        {
            "token": "ke",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 0
        },
        {
            "token": "kj",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 0
        },
        {
            "token": "jia",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 1
        }
    ]
}

Docker 安装Elasticsearch搜索引擎 搜索优化 词库挂载 拼音分词 插件安装

介绍

拉取镜像

运行

部署成功

安装分词

安装分词器

使用分词

自定义词库

拼音分词

Docker 安装Elasticsearch搜索引擎搜索优化词库挂载拼音分词插件安装