Elasticsearch高能指南

Elasticsearch 完整指南

目录

Elasticsearch简介

什么是Elasticsearch

Elasticsearch是一个基于Apache Lucene的分布式搜索引擎,提供实时搜索和分析功能。它是Elastic Stack(ELK Stack)的核心组件,广泛应用于日志分析、全文搜索、监控分析等场景。

主要特点

  • 分布式架构: 支持水平扩展,自动分片和复制
  • 实时搜索: 近实时的搜索和分析能力
  • 全文搜索: 基于Lucene的强大全文搜索功能
  • RESTful API: 简单易用的HTTP API接口
  • 多租户: 支持多索引和多类型数据
  • 聚合分析: 强大的数据聚合和分析能力

适用场景

  • 企业级搜索引擎
  • 日志分析和监控
  • 电商商品搜索
  • 内容管理系统
  • 实时数据分析
  • 安全信息分析

核心概念

集群架构

复制代码
Elasticsearch集群
├── 节点(Node) - 单个ES实例
│   ├── 主节点(Master Node) - 集群管理
│   ├── 数据节点(Data Node) - 数据存储和搜索
│   └── 协调节点(Coordinating Node) - 请求路由
├── 索引(Index) - 逻辑数据容器
│   ├── 分片(Shard) - 数据物理分割
│   │   ├── 主分片(Primary Shard) - 数据写入
│   │   └── 副本分片(Replica Shard) - 数据备份
│   └── 映射(Mapping) - 字段类型定义
└── 文档(Document) - 最小数据单元

数据类型

  • 核心类型: text, keyword, long, integer, double, boolean, date
  • 复杂类型: object, nested, array
  • 地理类型: geo_point, geo_shape
  • 特殊类型: ip, completion, token_count, percolator

使用技巧

1. 索引管理

json 复制代码
// 创建索引
PUT /my_index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_analyzer",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "content": {
        "type": "text",
        "analyzer": "my_analyzer"
      },
      "tags": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date"
      }
    }
  }
}

2. 搜索查询

json 复制代码
// 复杂搜索查询
GET /my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "elasticsearch tutorial",
            "fields": ["title^2", "content"],
            "type": "best_fields",
            "fuzziness": "AUTO"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "created_at": {
              "gte": "2023-01-01",
              "lte": "2023-12-31"
            }
          }
        },
        {
          "terms": {
            "tags": ["search", "tutorial"]
          }
        }
      ]
    }
  },
  "aggs": {
    "tag_counts": {
      "terms": {
        "field": "tags",
        "size": 10
      }
    },
    "date_histogram": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      }
    }
  },
  "sort": [
    { "_score": { "order": "desc" } },
    { "created_at": { "order": "desc" } }
  ],
  "from": 0,
  "size": 20
}

3. 聚合分析

json 复制代码
// 复杂聚合查询
GET /orders/_search
{
  "size": 0,
  "aggs": {
    "sales_by_category": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "total_sales": {
          "sum": {
            "field": "amount"
          }
        },
        "avg_order_value": {
          "avg": {
            "field": "amount"
          }
        },
        "sales_trend": {
          "date_histogram": {
            "field": "order_date",
            "calendar_interval": "month"
          },
          "aggs": {
            "monthly_sales": {
              "sum": {
                "field": "amount"
              }
            }
          }
        }
      }
    },
    "global_sales_stats": {
      "global": {},
      "aggs": {
        "total_revenue": {
          "sum": {
            "field": "amount"
          }
        },
        "avg_order_value": {
          "avg": {
            "field": "amount"
          }
        }
      }
    }
  }
}

4. 批量操作

json 复制代码
// 批量索引文档
POST /_bulk
{"index":{"_index":"my_index","_id":"1"}}
{"title":"Elasticsearch Guide","content":"Complete guide to ES","tags":["search","guide"]}
{"index":{"_index":"my_index","_id":"2"}}
{"title":"Spring Boot Integration","content":"How to integrate ES with Spring Boot","tags":["spring","integration"]}

// 批量更新
POST /_bulk
{"update":{"_index":"my_index","_id":"1"}}
{"doc":{"views":100,"updated_at":"2023-12-01"}}
{"update":{"_index":"my_index","_id":"2"}}
{"doc":{"views":50,"updated_at":"2023-12-01"}}

重难点解析

1. 分片策略

json 复制代码
// 分片数量计算
// 分片数 = 数据量 / 单个分片大小(建议30-50GB)
// 分片数 = CPU核心数 * 2

// 分片键选择
PUT /logs/_settings
{
  "index.routing.allocation.require.box_type": "hot"
}

// 分片预热
POST /logs/_forcemerge?max_num_segments=1

2. 性能优化

json 复制代码
// 查询优化
GET /my_index/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "status": "active"
        }
      }
    }
  },
  "_source": ["title", "content"], // 只返回需要的字段
  "size": 1000 // 避免深度分页
}

// 索引优化
PUT /my_index/_settings
{
  "index.refresh_interval": "30s", // 降低刷新频率
  "index.number_of_replicas": 0,   // 写入时减少副本
  "index.translog.durability": "async" // 异步事务日志
}

3. 集群管理

json 复制代码
// 集群健康检查
GET /_cluster/health?pretty

// 节点信息
GET /_nodes/stats?pretty

// 索引统计
GET /_stats?pretty

// 分片分配
PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}

4. 数据建模

json 复制代码
// 父子关系建模
PUT /company
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "join_field": {
        "type": "join",
        "relations": {
          "company": "employee"
        }
      }
    }
  }
}

// 嵌套对象
PUT /products
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "variants": {
        "type": "nested",
        "properties": {
          "color": { "type": "keyword" },
          "size": { "type": "keyword" },
          "price": { "type": "double" }
        }
      }
    }
  }
}

Spring Boot集成

1. 依赖配置

xml 复制代码
<!-- Maven -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
gradle 复制代码
// Gradle
implementation 'org.springframework.boot:spring-boot-starter-data-elasticsearch'

2. 配置文件

yaml 复制代码
# application.yml
spring:
  elasticsearch:
    uris: http://localhost:9200
    connection-timeout: 1s
    socket-timeout: 30s
  
    # 安全配置
    # username: elastic
    # password: changeme
  
    # 集群配置
    # uris: http://es-node1:9200,http://es-node2:9200,http://es-node3:9200

3. 实体类定义

java 复制代码
@Document(indexName = "articles")
@Setting(settingPath = "es-settings.json")
public class Article {
  
    @Id
    private String id;
  
    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String title;
  
    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String content;
  
    @Field(type = FieldType.Keyword)
    private List<String> tags;
  
    @Field(type = FieldType.Keyword)
    private String category;
  
    @Field(type = FieldType.Date)
    private LocalDateTime createdAt;
  
    @Field(type = FieldType.Long)
    private Long views;
  
    // 构造函数、getter、setter
}

4. Repository接口

java 复制代码
@Repository
public interface ArticleRepository extends ElasticsearchRepository<Article, String> {
  
    // 自定义查询方法
    List<Article> findByTitleContaining(String title);
  
    List<Article> findByCategoryAndTagsIn(String category, List<String> tags);
  
    Page<Article> findByCreatedAtBetween(
        LocalDateTime start, 
        LocalDateTime end, 
        Pageable pageable
    );
  
    // 使用@Query注解
    @Query("{\"bool\": {\"must\": [{\"match\": {\"title\": \"?0\"}}]}}")
    List<Article> searchByTitle(String title);
  
    // 聚合查询
    @Query("{\"aggs\": {\"category_count\": {\"terms\": {\"field\": \"category\"}}}}")
    SearchHits<Article> getCategoryStats();
}

5. 服务层实现

java 复制代码
@Service
public class ArticleService {
  
    @Autowired
    private ArticleRepository articleRepository;
  
    @Autowired
    private ElasticsearchRestTemplate elasticsearchTemplate;
  
    // 基本CRUD操作
    public Article createArticle(Article article) {
        article.setCreatedAt(LocalDateTime.now());
        article.setViews(0L);
        return articleRepository.save(article);
    }
  
    public Article findById(String id) {
        return articleRepository.findById(id)
            .orElseThrow(() -> new ArticleNotFoundException("Article not found"));
    }
  
    // 复杂搜索查询
    public SearchPage<Article> searchArticles(SearchRequest request, Pageable pageable) {
        BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
      
        // 标题和内容搜索
        if (StringUtils.hasText(request.getKeyword())) {
            queryBuilder.must(QueryBuilders.multiMatchQuery(request.getKeyword())
                .field("title", 2.0f)
                .field("content")
                .type(MultiMatchQueryBuilder.Type.BEST_FIELDS)
                .fuzziness(Fuzziness.AUTO));
        }
      
        // 分类过滤
        if (StringUtils.hasText(request.getCategory())) {
            queryBuilder.filter(QueryBuilders.termQuery("category", request.getCategory()));
        }
      
        // 标签过滤
        if (request.getTags() != null && !request.getTags().isEmpty()) {
            queryBuilder.filter(QueryBuilders.termsQuery("tags", request.getTags()));
        }
      
        // 时间范围过滤
        if (request.getStartDate() != null && request.getEndDate() != null) {
            queryBuilder.filter(QueryBuilders.rangeQuery("createdAt")
                .gte(request.getStartDate())
                .lte(request.getEndDate()));
        }
      
        // 创建搜索查询
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
            .withQuery(queryBuilder)
            .withSort(Sort.by(Sort.Direction.DESC, "_score"))
            .withSort(Sort.by(Sort.Direction.DESC, "createdAt"))
            .withPageable(pageable)
            .build();
      
        return elasticsearchTemplate.search(searchQuery, Article.class);
    }
  
    // 聚合分析
    public Map<String, Object> getArticleStats() {
        // 分类统计
        TermsAggregationBuilder categoryAgg = AggregationBuilders
            .terms("category_stats")
            .field("category")
            .size(10);
      
        // 标签统计
        TermsAggregationBuilder tagAgg = AggregationBuilders
            .terms("tag_stats")
            .field("tags")
            .size(20);
      
        // 时间趋势
        DateHistogramAggregationBuilder timeAgg = AggregationBuilders
            .dateHistogram("time_trend")
            .field("createdAt")
            .calendarInterval(DateHistogramInterval.MONTH);
      
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
            .withQuery(QueryBuilders.matchAllQuery())
            .addAggregation(categoryAgg)
            .addAggregation(tagAgg)
            .addAggregation(timeAgg)
            .build();
      
        SearchHits<Article> searchHits = elasticsearchTemplate.search(searchQuery, Article.class);
      
        Map<String, Object> stats = new HashMap<>();
        stats.put("category_stats", searchHits.getAggregations().get("category_stats"));
        stats.put("tag_stats", searchHits.getAggregations().get("tag_stats"));
        stats.put("time_trend", searchHits.getAggregations().get("time_trend"));
      
        return stats;
    }
  
    // 批量操作
    @Transactional
    public void bulkIndexArticles(List<Article> articles) {
        BulkOperations bulkOps = elasticsearchTemplate.bulkOps(BulkOperations.BulkMode.INDEX, Article.class);
      
        articles.forEach(article -> {
            article.setCreatedAt(LocalDateTime.now());
            article.setViews(0L);
            bulkOps.insert(article);
        });
      
        bulkOps.execute();
    }
  
    // 更新文档
    public void updateArticleViews(String id) {
        UpdateQuery updateQuery = UpdateQuery.builder(id)
            .withScript(new Script(ScriptType.INLINE, "painless", 
                "ctx._source.views += 1", Collections.emptyMap()))
            .build();
      
        elasticsearchTemplate.update(updateQuery, IndexCoordinates.of("articles"));
    }
}

6. 控制器层

java 复制代码
@RestController
@RequestMapping("/api/articles")
public class ArticleController {
  
    @Autowired
    private ArticleService articleService;
  
    @PostMapping
    public ResponseEntity<Article> createArticle(@RequestBody Article article) {
        Article createdArticle = articleService.createArticle(article);
        return ResponseEntity.status(HttpStatus.CREATED).body(createdArticle);
    }
  
    @GetMapping("/{id}")
    public ResponseEntity<Article> getArticleById(@PathVariable String id) {
        Article article = articleService.findById(id);
        return ResponseEntity.ok(article);
    }
  
    @GetMapping("/search")
    public ResponseEntity<SearchPage<Article>> searchArticles(
            @ModelAttribute SearchRequest request,
            @PageableDefault(sort = "createdAt", direction = Sort.Direction.DESC) Pageable pageable) {
      
        SearchPage<Article> articles = articleService.searchArticles(request, pageable);
        return ResponseEntity.ok(articles);
    }
  
    @GetMapping("/stats")
    public ResponseEntity<Map<String, Object>> getArticleStats() {
        Map<String, Object> stats = articleService.getArticleStats();
        return ResponseEntity.ok(stats);
    }
  
    @PostMapping("/bulk")
    public ResponseEntity<Void> bulkIndexArticles(@RequestBody List<Article> articles) {
        articleService.bulkIndexArticles(articles);
        return ResponseEntity.ok().build();
    }
  
    @PutMapping("/{id}/views")
    public ResponseEntity<Void> incrementViews(@PathVariable String id) {
        articleService.updateArticleViews(id);
        return ResponseEntity.ok().build();
    }
}

7. 配置类

java 复制代码
@Configuration
@EnableElasticsearchRepositories(basePackages = "com.example.repository")
public class ElasticsearchConfig extends AbstractElasticsearchConfiguration {
  
    @Value("${spring.elasticsearch.uris}")
    private String elasticsearchUrl;
  
    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient() {
        ClientConfiguration clientConfiguration = ClientConfiguration.builder()
            .connectedTo(elasticsearchUrl.replace("http://", ""))
            .withConnectTimeout(Duration.ofSeconds(5))
            .withSocketTimeout(Duration.ofSeconds(30))
            .build();
      
        return RestClients.create(clientConfiguration).rest();
    }
  
    @Bean
    public ElasticsearchRestTemplate elasticsearchRestTemplate() {
        return new ElasticsearchRestTemplate(elasticsearchClient());
    }
  
    // 自定义转换器
    @Bean
    public ElasticsearchCustomConversions customConversions() {
        return new ElasticsearchCustomConversions(Arrays.asList(
            new LocalDateTimeToDateConverter(),
            new DateToLocalDateTimeConverter()
        ));
    }
}

最佳实践

1. 索引设计

  • 分片数量: 根据数据量和硬件资源合理设置
  • 副本数量: 生产环境至少1个副本
  • 映射设计: 合理选择字段类型和分析器
  • 索引生命周期: 使用ILM管理索引生命周期

2. 查询优化

  • 使用filter context减少评分计算
  • 合理使用_source字段减少网络传输
  • 避免深度分页,使用search_after
  • 使用聚合替代应用层计算

3. 性能调优

  • 调整refresh_interval平衡实时性和性能
  • 使用bulk API进行批量操作
  • 合理设置分片大小(30-50GB)
  • 监控集群健康状态

4. 安全配置

  • 启用X-Pack安全功能
  • 配置TLS/SSL加密
  • 设置用户权限和角色
  • 定期备份数据

5. 监控和维护

  • 监控集群健康状态
  • 设置告警机制
  • 定期清理无用索引
  • 监控查询性能

总结

Elasticsearch作为强大的搜索引擎和分析平台,具有分布式架构、实时搜索、强大的聚合分析等特性。通过合理的索引设计、查询优化和Spring Boot集成,可以构建高性能的搜索和分析应用。

关键要点:

  1. 理解Elasticsearch的分布式架构和核心概念
  2. 掌握索引设计和映射配置
  3. 熟练使用查询DSL和聚合API
  4. 正确配置Spring Boot集成
  5. 遵循最佳实践确保性能和稳定性

通过本指南的学习,您应该能够熟练使用Elasticsearch并成功集成到Spring Boot项目中,构建强大的搜索和分析功能。