Elasticsearch 完整指南
目录
- Elasticsearch简介
- 核心概念
- 使用技巧
- 重难点解析
- [Spring Boot集成](#Spring Boot集成)
- 最佳实践
Elasticsearch简介
什么是Elasticsearch
Elasticsearch是一个基于Apache Lucene的分布式搜索引擎,提供实时搜索和分析功能。它是Elastic Stack(ELK Stack)的核心组件,广泛应用于日志分析、全文搜索、监控分析等场景。
主要特点
- 分布式架构: 支持水平扩展,自动分片和复制
- 实时搜索: 近实时的搜索和分析能力
- 全文搜索: 基于Lucene的强大全文搜索功能
- RESTful API: 简单易用的HTTP API接口
- 多租户: 支持多索引和多类型数据
- 聚合分析: 强大的数据聚合和分析能力
适用场景
- 企业级搜索引擎
- 日志分析和监控
- 电商商品搜索
- 内容管理系统
- 实时数据分析
- 安全信息分析
核心概念
集群架构
Elasticsearch集群
├── 节点(Node) - 单个ES实例
│ ├── 主节点(Master Node) - 集群管理
│ ├── 数据节点(Data Node) - 数据存储和搜索
│ └── 协调节点(Coordinating Node) - 请求路由
├── 索引(Index) - 逻辑数据容器
│ ├── 分片(Shard) - 数据物理分割
│ │ ├── 主分片(Primary Shard) - 数据写入
│ │ └── 副本分片(Replica Shard) - 数据备份
│ └── 映射(Mapping) - 字段类型定义
└── 文档(Document) - 最小数据单元
数据类型
- 核心类型: text, keyword, long, integer, double, boolean, date
- 复杂类型: object, nested, array
- 地理类型: geo_point, geo_shape
- 特殊类型: ip, completion, token_count, percolator
使用技巧
1. 索引管理
json
// 创建索引
PUT /my_index
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"tags": {
"type": "keyword"
},
"created_at": {
"type": "date"
}
}
}
}
2. 搜索查询
json
// 复杂搜索查询
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "elasticsearch tutorial",
"fields": ["title^2", "content"],
"type": "best_fields",
"fuzziness": "AUTO"
}
}
],
"filter": [
{
"range": {
"created_at": {
"gte": "2023-01-01",
"lte": "2023-12-31"
}
}
},
{
"terms": {
"tags": ["search", "tutorial"]
}
}
]
}
},
"aggs": {
"tag_counts": {
"terms": {
"field": "tags",
"size": 10
}
},
"date_histogram": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "month"
}
}
},
"sort": [
{ "_score": { "order": "desc" } },
{ "created_at": { "order": "desc" } }
],
"from": 0,
"size": 20
}
3. 聚合分析
json
// 复杂聚合查询
GET /orders/_search
{
"size": 0,
"aggs": {
"sales_by_category": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"total_sales": {
"sum": {
"field": "amount"
}
},
"avg_order_value": {
"avg": {
"field": "amount"
}
},
"sales_trend": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "month"
},
"aggs": {
"monthly_sales": {
"sum": {
"field": "amount"
}
}
}
}
}
},
"global_sales_stats": {
"global": {},
"aggs": {
"total_revenue": {
"sum": {
"field": "amount"
}
},
"avg_order_value": {
"avg": {
"field": "amount"
}
}
}
}
}
}
4. 批量操作
json
// 批量索引文档
POST /_bulk
{"index":{"_index":"my_index","_id":"1"}}
{"title":"Elasticsearch Guide","content":"Complete guide to ES","tags":["search","guide"]}
{"index":{"_index":"my_index","_id":"2"}}
{"title":"Spring Boot Integration","content":"How to integrate ES with Spring Boot","tags":["spring","integration"]}
// 批量更新
POST /_bulk
{"update":{"_index":"my_index","_id":"1"}}
{"doc":{"views":100,"updated_at":"2023-12-01"}}
{"update":{"_index":"my_index","_id":"2"}}
{"doc":{"views":50,"updated_at":"2023-12-01"}}
重难点解析
1. 分片策略
json
// 分片数量计算
// 分片数 = 数据量 / 单个分片大小(建议30-50GB)
// 分片数 = CPU核心数 * 2
// 分片键选择
PUT /logs/_settings
{
"index.routing.allocation.require.box_type": "hot"
}
// 分片预热
POST /logs/_forcemerge?max_num_segments=1
2. 性能优化
json
// 查询优化
GET /my_index/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"status": "active"
}
}
}
},
"_source": ["title", "content"], // 只返回需要的字段
"size": 1000 // 避免深度分页
}
// 索引优化
PUT /my_index/_settings
{
"index.refresh_interval": "30s", // 降低刷新频率
"index.number_of_replicas": 0, // 写入时减少副本
"index.translog.durability": "async" // 异步事务日志
}
3. 集群管理
json
// 集群健康检查
GET /_cluster/health?pretty
// 节点信息
GET /_nodes/stats?pretty
// 索引统计
GET /_stats?pretty
// 分片分配
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}
4. 数据建模
json
// 父子关系建模
PUT /company
{
"mappings": {
"properties": {
"name": { "type": "text" },
"join_field": {
"type": "join",
"relations": {
"company": "employee"
}
}
}
}
}
// 嵌套对象
PUT /products
{
"mappings": {
"properties": {
"name": { "type": "text" },
"variants": {
"type": "nested",
"properties": {
"color": { "type": "keyword" },
"size": { "type": "keyword" },
"price": { "type": "double" }
}
}
}
}
}
Spring Boot集成
1. 依赖配置
xml
<!-- Maven -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
gradle
// Gradle
implementation 'org.springframework.boot:spring-boot-starter-data-elasticsearch'
2. 配置文件
yaml
# application.yml
spring:
elasticsearch:
uris: http://localhost:9200
connection-timeout: 1s
socket-timeout: 30s
# 安全配置
# username: elastic
# password: changeme
# 集群配置
# uris: http://es-node1:9200,http://es-node2:9200,http://es-node3:9200
3. 实体类定义
java
@Document(indexName = "articles")
@Setting(settingPath = "es-settings.json")
public class Article {
@Id
private String id;
@Field(type = FieldType.Text, analyzer = "ik_max_word")
private String title;
@Field(type = FieldType.Text, analyzer = "ik_max_word")
private String content;
@Field(type = FieldType.Keyword)
private List<String> tags;
@Field(type = FieldType.Keyword)
private String category;
@Field(type = FieldType.Date)
private LocalDateTime createdAt;
@Field(type = FieldType.Long)
private Long views;
// 构造函数、getter、setter
}
4. Repository接口
java
@Repository
public interface ArticleRepository extends ElasticsearchRepository<Article, String> {
// 自定义查询方法
List<Article> findByTitleContaining(String title);
List<Article> findByCategoryAndTagsIn(String category, List<String> tags);
Page<Article> findByCreatedAtBetween(
LocalDateTime start,
LocalDateTime end,
Pageable pageable
);
// 使用@Query注解
@Query("{\"bool\": {\"must\": [{\"match\": {\"title\": \"?0\"}}]}}")
List<Article> searchByTitle(String title);
// 聚合查询
@Query("{\"aggs\": {\"category_count\": {\"terms\": {\"field\": \"category\"}}}}")
SearchHits<Article> getCategoryStats();
}
5. 服务层实现
java
@Service
public class ArticleService {
@Autowired
private ArticleRepository articleRepository;
@Autowired
private ElasticsearchRestTemplate elasticsearchTemplate;
// 基本CRUD操作
public Article createArticle(Article article) {
article.setCreatedAt(LocalDateTime.now());
article.setViews(0L);
return articleRepository.save(article);
}
public Article findById(String id) {
return articleRepository.findById(id)
.orElseThrow(() -> new ArticleNotFoundException("Article not found"));
}
// 复杂搜索查询
public SearchPage<Article> searchArticles(SearchRequest request, Pageable pageable) {
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
// 标题和内容搜索
if (StringUtils.hasText(request.getKeyword())) {
queryBuilder.must(QueryBuilders.multiMatchQuery(request.getKeyword())
.field("title", 2.0f)
.field("content")
.type(MultiMatchQueryBuilder.Type.BEST_FIELDS)
.fuzziness(Fuzziness.AUTO));
}
// 分类过滤
if (StringUtils.hasText(request.getCategory())) {
queryBuilder.filter(QueryBuilders.termQuery("category", request.getCategory()));
}
// 标签过滤
if (request.getTags() != null && !request.getTags().isEmpty()) {
queryBuilder.filter(QueryBuilders.termsQuery("tags", request.getTags()));
}
// 时间范围过滤
if (request.getStartDate() != null && request.getEndDate() != null) {
queryBuilder.filter(QueryBuilders.rangeQuery("createdAt")
.gte(request.getStartDate())
.lte(request.getEndDate()));
}
// 创建搜索查询
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(queryBuilder)
.withSort(Sort.by(Sort.Direction.DESC, "_score"))
.withSort(Sort.by(Sort.Direction.DESC, "createdAt"))
.withPageable(pageable)
.build();
return elasticsearchTemplate.search(searchQuery, Article.class);
}
// 聚合分析
public Map<String, Object> getArticleStats() {
// 分类统计
TermsAggregationBuilder categoryAgg = AggregationBuilders
.terms("category_stats")
.field("category")
.size(10);
// 标签统计
TermsAggregationBuilder tagAgg = AggregationBuilders
.terms("tag_stats")
.field("tags")
.size(20);
// 时间趋势
DateHistogramAggregationBuilder timeAgg = AggregationBuilders
.dateHistogram("time_trend")
.field("createdAt")
.calendarInterval(DateHistogramInterval.MONTH);
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.matchAllQuery())
.addAggregation(categoryAgg)
.addAggregation(tagAgg)
.addAggregation(timeAgg)
.build();
SearchHits<Article> searchHits = elasticsearchTemplate.search(searchQuery, Article.class);
Map<String, Object> stats = new HashMap<>();
stats.put("category_stats", searchHits.getAggregations().get("category_stats"));
stats.put("tag_stats", searchHits.getAggregations().get("tag_stats"));
stats.put("time_trend", searchHits.getAggregations().get("time_trend"));
return stats;
}
// 批量操作
@Transactional
public void bulkIndexArticles(List<Article> articles) {
BulkOperations bulkOps = elasticsearchTemplate.bulkOps(BulkOperations.BulkMode.INDEX, Article.class);
articles.forEach(article -> {
article.setCreatedAt(LocalDateTime.now());
article.setViews(0L);
bulkOps.insert(article);
});
bulkOps.execute();
}
// 更新文档
public void updateArticleViews(String id) {
UpdateQuery updateQuery = UpdateQuery.builder(id)
.withScript(new Script(ScriptType.INLINE, "painless",
"ctx._source.views += 1", Collections.emptyMap()))
.build();
elasticsearchTemplate.update(updateQuery, IndexCoordinates.of("articles"));
}
}
6. 控制器层
java
@RestController
@RequestMapping("/api/articles")
public class ArticleController {
@Autowired
private ArticleService articleService;
@PostMapping
public ResponseEntity<Article> createArticle(@RequestBody Article article) {
Article createdArticle = articleService.createArticle(article);
return ResponseEntity.status(HttpStatus.CREATED).body(createdArticle);
}
@GetMapping("/{id}")
public ResponseEntity<Article> getArticleById(@PathVariable String id) {
Article article = articleService.findById(id);
return ResponseEntity.ok(article);
}
@GetMapping("/search")
public ResponseEntity<SearchPage<Article>> searchArticles(
@ModelAttribute SearchRequest request,
@PageableDefault(sort = "createdAt", direction = Sort.Direction.DESC) Pageable pageable) {
SearchPage<Article> articles = articleService.searchArticles(request, pageable);
return ResponseEntity.ok(articles);
}
@GetMapping("/stats")
public ResponseEntity<Map<String, Object>> getArticleStats() {
Map<String, Object> stats = articleService.getArticleStats();
return ResponseEntity.ok(stats);
}
@PostMapping("/bulk")
public ResponseEntity<Void> bulkIndexArticles(@RequestBody List<Article> articles) {
articleService.bulkIndexArticles(articles);
return ResponseEntity.ok().build();
}
@PutMapping("/{id}/views")
public ResponseEntity<Void> incrementViews(@PathVariable String id) {
articleService.updateArticleViews(id);
return ResponseEntity.ok().build();
}
}
7. 配置类
java
@Configuration
@EnableElasticsearchRepositories(basePackages = "com.example.repository")
public class ElasticsearchConfig extends AbstractElasticsearchConfiguration {
@Value("${spring.elasticsearch.uris}")
private String elasticsearchUrl;
@Override
@Bean
public RestHighLevelClient elasticsearchClient() {
ClientConfiguration clientConfiguration = ClientConfiguration.builder()
.connectedTo(elasticsearchUrl.replace("http://", ""))
.withConnectTimeout(Duration.ofSeconds(5))
.withSocketTimeout(Duration.ofSeconds(30))
.build();
return RestClients.create(clientConfiguration).rest();
}
@Bean
public ElasticsearchRestTemplate elasticsearchRestTemplate() {
return new ElasticsearchRestTemplate(elasticsearchClient());
}
// 自定义转换器
@Bean
public ElasticsearchCustomConversions customConversions() {
return new ElasticsearchCustomConversions(Arrays.asList(
new LocalDateTimeToDateConverter(),
new DateToLocalDateTimeConverter()
));
}
}
最佳实践
1. 索引设计
- 分片数量: 根据数据量和硬件资源合理设置
- 副本数量: 生产环境至少1个副本
- 映射设计: 合理选择字段类型和分析器
- 索引生命周期: 使用ILM管理索引生命周期
2. 查询优化
- 使用filter context减少评分计算
- 合理使用_source字段减少网络传输
- 避免深度分页,使用search_after
- 使用聚合替代应用层计算
3. 性能调优
- 调整refresh_interval平衡实时性和性能
- 使用bulk API进行批量操作
- 合理设置分片大小(30-50GB)
- 监控集群健康状态
4. 安全配置
- 启用X-Pack安全功能
- 配置TLS/SSL加密
- 设置用户权限和角色
- 定期备份数据
5. 监控和维护
- 监控集群健康状态
- 设置告警机制
- 定期清理无用索引
- 监控查询性能
总结
Elasticsearch作为强大的搜索引擎和分析平台,具有分布式架构、实时搜索、强大的聚合分析等特性。通过合理的索引设计、查询优化和Spring Boot集成,可以构建高性能的搜索和分析应用。
关键要点:
- 理解Elasticsearch的分布式架构和核心概念
- 掌握索引设计和映射配置
- 熟练使用查询DSL和聚合API
- 正确配置Spring Boot集成
- 遵循最佳实践确保性能和稳定性
通过本指南的学习,您应该能够熟练使用Elasticsearch并成功集成到Spring Boot项目中,构建强大的搜索和分析功能。