一、Elasticsearch 简介
1.1 什么是Elasticsearch
Elasticsearch是一个基于Apache Lucene构建的开源、分布式、RESTful风格的搜索和数据分析引擎。它诞生于2010年,由Shay Banon创建,最初是为了帮助他的妻子学习烹饪食谱而开发的搜索工具。现在已经成为ELK技术栈的核心组件,广泛应用于各类搜索和数据分析场景。
1.2 核心特性与优势
Elasticsearch具备几个突出的核心特性:首先是近实时的搜索能力,能够在秒级别内对数据进行索引和搜索;其次是分布式架构,支持水平扩展,可以处理PB级别的数据;再者是RESTful API,通过简单的HTTP请求即可完成复杂的搜索操作;最后是多租户支持,支持多个索引并存,每个索引可以有多个类型。
与传统数据库相比,Elasticsearch最大的优势在于其全文搜索能力和强大的数据分析功能。它采用倒排索引结构,使得全文搜索性能远超传统数据库的LIKE查询。同时,其分布式架构保证了高可用性和可扩展性。
1.3 技术架构原理
Elasticsearch采用Java语言开发,其架构设计遵循主从模式。集群中的节点分为Master节点、Data节点和协调节点。Master节点负责集群管理,Data节点存储数据并执行搜索操作,协调节点负责路由请求和聚合结果。
数据存储采用倒排索引结构,这使其在全文搜索方面具有天然优势。倒排索引通过建立词条到文档的映射关系,能够快速定位包含特定词条的文档,大大提升了搜索效率。
1.4 主要应用领域
Elasticsearch在多个领域都有广泛应用:电商平台的商品搜索、内容管理系统的全文检索、日志分析与监控、安全信息与事件管理(SIEM)、业务数据分析等。特别是在需要处理大量文本数据、进行复杂搜索和分析的场景中,Elasticsearch已经成为事实上的标准解决方案。
1.5 ELK技术栈
Elasticsearch通常与Logstash和Kibana配合使用,组成完整的ELK技术栈:
- Elasticsearch:核心搜索引擎,负责数据存储和检索
- Logstash:数据收集和处理工具,支持多种输入输出格式
- Kibana:数据可视化平台,提供仪表盘和图表功能
此外,Beats系列数据收集器(Filebeat、Metricbeat等)也是生态系统的重要组成部分,负责从各种数据源收集数据并发送到Elasticsearch。
二、Elasticsearch 安装部署
1.1 系统要求
Elasticsearch对系统环境有一定要求:推荐使用Linux操作系统,内存至少4GB(生产环境建议16GB以上),需要Java 8或更高版本,磁盘空间根据数据量确定。在Windows系统上也可以运行,但生产环境通常选择Linux以获得更好的性能和稳定性。
1.2 Docker 安装方式
Docker是最便捷的安装方式,国内用户可以使用阿里云镜像加速:
bash
# 拉取镜像
docker pull elasticsearch:8.11.0
# 创建数据目录
mkdir -p /usr/local/elasticsearch/data
mkdir -p /usr/local/elasticsearch/logs
# 启动容器
docker run -d --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
-e "xpack.security.enabled=false" \
-v /usr/local/elasticsearch/data:/usr/share/elasticsearch/data \
-v /usr/local/elasticsearch/logs:/usr/share/elasticsearch/logs \
elasticsearch:8.11.0
1.3 直接下载安装
访问Elasticsearch官网下载页面,由于国外下载较慢,推荐使用清华大学镜像:
bash
# 下载安装包
wget https://mirrors.tuna.tsinghua.edu.cn/elasticstack/8.x/yum/elasticsearch-8.11.0-x86_64.rpm
# 安装
rpm -ivh elasticsearch-8.11.0-x86_64.rpm
# 启动服务
systemctl enable elasticsearch
systemctl start elasticsearch
1.4 验证安装
安装完成后,访问 http://localhost:9200 应该能看到类似以下的JSON响应:
json
{
"name" : "node-1",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "xxx",
"version" : {
"number" : "8.11.0",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "xxx",
"build_date" : "2023-12-11T11:03:35.984447776Z",
"build_snapshot" : false,
"lucene_version" : "9.8.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
三、Elasticsearch 基础使用
3.1 基本概念理解
在使用Elasticsearch之前,需要理解几个核心概念:索引(Index)类似于数据库,类型(Type)类似于表(新版本已弱化),文档(Document)类似于记录,字段(Field)类似于列。Elasticsearch使用JSON格式存储数据,通过RESTful API进行操作。
3.2 创建索引和映射
创建商品索引的示例:
bash
# 创建索引
curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"price": {
"type": "double"
},
"category": {
"type": "keyword"
},
"description": {
"type": "text",
"analyzer": "ik_max_word"
},
"tags": {
"type": "keyword"
},
"created_at": {
"type": "date"
}
}
}
}'
3.3 插入数据
向索引中添加文档数据:
bash
# 添加单个文档
curl -X POST "localhost:9200/products/_doc/1" -H 'Content-Type: application/json' -d'
{
"name": "Apple iPhone 15 Pro",
"price": 7999.00,
"category": "手机数码",
"description": "苹果最新旗舰手机,搭载A17 Pro芯片,钛金属机身设计",
"tags": ["苹果", "5G", "高端手机"],
"created_at": "2024-01-15T10:30:00"
}'
# 批量添加文档
curl -X POST "localhost:9200/products/_bulk" -H 'Content-Type: application/json' -d'
{ "index" : { "_id" : "2" } }
{ "name": "华为Mate 60 Pro", "price": 6999.00, "category": "手机数码", "description": "华为回归之作,麒麟芯片回归", "tags": ["华为", "5G", "卫星通话"], "created_at": "2024-01-16T09:00:00" }
{ "index" : { "_id" : "3" } }
{ "name": "小米14 Ultra", "price": 5999.00, "category": "手机数码", "description": "小米影像旗舰,徕卡加持", "tags": ["小米", "影像", "徕卡"], "created_at": "2024-01-17T14:20:00" }
'
3.4 基本查询操作
各种查询方式的示例:
bash
# 匹配查询(全文搜索)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"description": "苹果 芯片"
}
}
}'
# 精确匹配
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"term": {
"category": "手机数码"
}
}
}'
# 范围查询(价格筛选)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"range": {
"price": {
"gte": 5000,
"lte": 8000
}
}
}
}'
# 复合查询(多条件组合)
curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{"match": {"description": "旗舰"}}
],
"filter": [
{"range": {"price": {"lte": 7000}}},
{"term": {"category": "手机数码"}}
]
}
}
}'
四、实战案例:电商商品搜索系统
4.1 需求描述
构建一个简单的电商商品搜索系统,支持商品名称搜索、分类筛选、价格区间筛选、排序功能。这个案例将展示Elasticsearch在实际项目中的应用流程。
4.2 数据准备
首先准备一批商品数据:
bash
# 创建更完整的商品索引
curl -X PUT "localhost:9200/ecommerce_products" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"product_id": {"type": "keyword"},
"name": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"keyword": {"type": "keyword"}
}
},
"brand": {"type": "keyword"},
"category": {"type": "keyword"},
"price": {"type": "double"},
"stock": {"type": "integer"},
"rating": {"type": "double"},
"sales_count": {"type": "integer"},
"description": {
"type": "text",
"analyzer": "ik_max_word"
},
"tags": {"type": "keyword"},
"created_at": {"type": "date"},
"updated_at": {"type": "date"}
}
}
}'
# 批量插入商品数据
curl -X POST "localhost:9200/ecommerce_products/_bulk" -H 'Content-Type: application/json' -d'
{ "index" : { "_id" : "p001" } }
{ "product_id": "P001", "name": "戴森吸尘器V15", "brand": "戴森", "category": "家用电器", "price": 4590.00, "stock": 100, "rating": 4.8, "sales_count": 1250, "description": "戴森最新款无线吸尘器,激光探测微尘技术", "tags": ["无绳", "除螨", "宠物家庭"], "created_at": "2024-01-01T00:00:00", "updated_at": "2024-01-01T00:00:00" }
{ "index" : { "_id" : "p002" } }
{ "product_id": "P002", "name": "索尼WH-1000XM5耳机", "brand": "索尼", "category": "数码配件", "price": 2499.00, "stock": 200, "rating": 4.9, "sales_count": 890, "description": "索尼旗舰降噪耳机,30小时续航", "tags": ["降噪", "无线", "Hi-Res"], "created_at": "2024-01-02T00:00:00", "updated_at": "2024-01-02T00:00:00" }
{ "index" : { "_id" : "p003" } }
{ "product_id": "P003", "name": "海尔冰箱BCD-501W", "brand": "海尔", "category": "家用电器", "price": 3299.00, "stock": 50, "rating": 4.6, "sales_count": 567, "description": "海尔501升双开门冰箱,风冷无霜", "tags": ["双开门", "节能", "智能控温"], "created_at": "2024-01-03T00:00:00", "updated_at": "2024-01-03T00:00:00" }
{ "index" : { "_id" : "p004" } }
{ "product_id": "P004", "name": "联想ThinkPad X1 Carbon", "brand": "联想", "category": "电脑办公", "price": 12999.00, "stock": 30, "rating": 4.7, "sales_count": 234, "description": "联想商务旗舰笔记本,碳纤维机身", "tags": ["商务本", "轻薄", "指纹识别"], "created_at": "2024-01-04T00:00:00", "updated_at": "2024-01-04T00:00:00" }
{ "index" : { "_id" : "p005" } }
{ "product_id": "P005", "name": "小米空气净化器Pro", "brand": "小米", "category": "家用电器", "price": 1499.00, "stock": 150, "rating": 4.5, "sales_count": 2340, "description": "小米智能空气净化器,PM2.5实时显示", "tags": ["智能", "静音", "App控制"], "created_at": "2024-01-05T00:00:00", "updated_at": "2024-01-05T00:00:00" }
'
4.3 搜索功能实现
实现各种搜索功能:
4.3.1 关键词搜索
bash
# 搜索包含"无线"的商品
curl -X GET "localhost:9200/ecommerce_products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"multi_match": {
"query": "无线",
"fields": ["name^3", "description", "tags"],
"type": "best_fields"
}
},
"highlight": {
"fields": {
"name": {},
"description": {}
}
}
}'
4.3.2 分类筛选
bash
# 筛选家用电器类商品
curl -X GET "localhost:9200/ecommerce_products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"filter": [
{"term": {"category": "家用电器"}}
]
}
},
"sort": [
{"sales_count": {"order": "desc"}}
]
}'
4.3.3 价格区间和排序
bash
# 价格1000-5000元,按评分降序
curl -X GET "localhost:9200/ecommerce_products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"filter": [
{"range": {"price": {"gte": 1000, "lte": 5000}}}
]
}
},
"sort": [
{"rating": {"order": "desc"}},
{"sales_count": {"order": "desc"}}
]
}'
4.3.4 综合搜索(多条件组合)
bash
# 搜索关键词,支持品牌、分类、价格、评分筛选
curl -X GET "localhost:9200/ecommerce_products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "智能",
"fields": ["name^2", "description", "tags"],
"fuzziness": "AUTO"
}
}
],
"filter": [
{"terms": {"category": ["家用电器", "数码配件"]}},
{"range": {"price": {"lte": 5000}}},
{"range": {"rating": {"gte": 4.5}}},
{"range": {"stock": {"gt": 0}}}
]
}
},
"sort": [
{"_score": {"order": "desc"}},
{"sales_count": {"order": "desc"}}
],
"from": 0,
"size": 10,
"highlight": {
"pre_tags": ["<em>"],
"post_tags": ["</em>"],
"fields": {
"name": {"fragment_size": 50},
"description": {"fragment_size": 100}
}
}
}'
4.4 聚合统计分析
bash
# 按分类统计商品数量和平均价格
curl -X GET "localhost:9200/ecommerce_products/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"categories": {
"terms": {"field": "category"},
"aggs": {
"avg_price": {"avg": {"field": "price"}},
"total_sales": {"sum": {"field": "sales_count"}},
"avg_rating": {"avg": {"field": "rating"}}
}
}
}
}'
# 价格区间分布统计
curl -X GET "localhost:9200/ecommerce_products/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{"to": 2000, "key": "2000元以下"},
{"from": 2000, "to": 5000, "key": "2000-5000元"},
{"from": 5000, "to": 10000, "key": "5000-10000元"},
{"from": 10000, "key": "10000元以上"}
]
}
}
}
}'
4.5 性能优化建议
基于这个案例,几个重要的性能优化点:
- 合理设置分片:小数据量单分片即可,大数据量根据文档数量和硬件配置调整
- 使用合适的分析器:中文场景使用IK分词器,提升搜索准确性
- 避免深度分页:使用scroll或search_after替代from+size大偏移量分页
- 合理使用过滤器:过滤条件使用bool查询的filter子句,可缓存结果
- 字段类型优化:不需要分词的字段使用keyword类型,节省存储空间
这个案例展示了Elasticsearch从安装部署到实际应用的完整流程,包含了数据建模、索引设计、搜索实现、性能分析等关键环节,为实际项目开发提供了参考。