本文面向有一定 Spring Boot 开发经验的 Java 程序员,将详细介绍如何在现有 Spring Boot + MySQL 技术栈中集成 ElasticSearch,实现高效的数据搜索功能。
1. ElasticSearch 简介
1.1 什么是 ElasticSearch?
ElasticSearch 是一个基于 Lucene 构建的开源、分布式、RESTful 搜索引擎。它提供了一个分布式多用户能力的全文搜索引擎,能够处理大规模数据的实时搜索和分析需求。
核心特性:
- 分布式架构:自动分片、数据复制和故障转移
- 近实时搜索:数据索引后几乎立即可搜
- RESTful API:简单的 JSON over HTTP 接口
- 多租户支持:支持多个索引和类型
- 强大的查询 DSL:丰富的查询和聚合功能
1.2 为什么需要 ElasticSearch?
在传统的关系型数据库(如 MySQL)中,面对以下场景时会遇到瓶颈:
- 复杂的全文搜索需求
- 模糊匹配和相关性排序
- 海量数据的快速检索
- 实时数据分析和大屏展示
2. 环境准备与依赖配置
2.1 添加 Maven 依赖
xml
xml
<!-- Spring Boot Starter Data Elasticsearch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!-- MySQL 驱动 -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<scope>runtime</scope>
</dependency>
<!-- Spring Data JPA -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
2.2 配置文件
yaml
yaml
# application.yml
spring:
datasource:
url: jdbc:mysql://localhost:3306/blog_db
username: root
password: password
driver-class-name: com.mysql.cj.jdbc.Driver
jpa:
hibernate:
ddl-auto: update
show-sql: true
elasticsearch:
uris: http://localhost:9200
connection-timeout: 10s
socket-timeout: 30s
# 自定义配置
app:
elasticsearch:
index-settings:
number-of-shards: 3
number-of-replicas: 1
3. 数据模型设计
3.1 MySQL 实体类
java
less
@Entity
@Table(name = "articles")
public class Article {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false, length = 500)
private String title;
@Column(columnDefinition = "TEXT")
private String content;
@Column(name = "author_id")
private Long authorId;
@Column(name = "create_time")
private LocalDateTime createTime;
@Column(name = "update_time")
private LocalDateTime updateTime;
// getters and setters
}
3.2 ElasticSearch 文档类
java
typescript
@Document(indexName = "articles")
public class ArticleDocument {
@Id
private String id;
@Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
private String title;
@Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
private String content;
@Field(type = FieldType.Long)
private Long authorId;
@Field(type = FieldType.Keyword)
private String authorName;
@Field(type = FieldType.Date, format = DateFormat.date_hour_minute_second)
private LocalDateTime createTime;
// getters and setters
}
4. 数据同步策略
4.1 双写模式
java
ini
@Service
@Transactional
public class ArticleService {
private final ArticleRepository articleRepository;
private final ArticleSearchRepository articleSearchRepository;
public ArticleService(ArticleRepository articleRepository,
ArticleSearchRepository articleSearchRepository) {
this.articleRepository = articleRepository;
this.articleSearchRepository = articleSearchRepository;
}
public Article createArticle(Article article) {
// 保存到 MySQL
Article savedArticle = articleRepository.save(article);
// 同步到 ElasticSearch
ArticleDocument document = convertToDocument(savedArticle);
articleSearchRepository.save(document);
return savedArticle;
}
public void updateArticle(Long id, Article article) {
// 更新 MySQL
article.setId(id);
Article updatedArticle = articleRepository.save(article);
// 更新 ElasticSearch
ArticleDocument document = convertToDocument(updatedArticle);
articleSearchRepository.save(document);
}
private ArticleDocument convertToDocument(Article article) {
ArticleDocument document = new ArticleDocument();
document.setId(article.getId().toString());
document.setTitle(article.getTitle());
document.setContent(article.getContent());
document.setAuthorId(article.getAuthorId());
document.setCreateTime(article.getCreateTime());
return document;
}
}
4.2 使用 Logstash 进行数据同步
对于存量数据或大数据量场景,建议使用 Logstash 进行数据同步:
ruby
ini
# mysql-to-es.conf
input {
jdbc {
jdbc_driver_library => "/path/to/mysql-connector-java-8.0.23.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/blog_db"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "SELECT id, title, content, author_id, create_time FROM articles WHERE update_time > :sql_last_value"
use_column_value => true
tracking_column => "update_time"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "articles"
document_id => "%{id}"
}
}
5. ElasticSearch 查询实现
5.1 Repository 层
java
arduino
public interface ArticleSearchRepository extends ElasticsearchRepository<ArticleDocument, String> {
// 简单查询
List<ArticleDocument> findByTitle(String title);
// 分词搜索
List<ArticleDocument> findByTitleContainingOrContentContaining(String title, String content);
// 使用 @Query 注解自定义查询
@Query("""
{
"bool": {
"should": [
{ "match": { "title": "?0" } },
{ "match": { "content": "?0" } }
]
}
}
""")
List<ArticleDocument> findByCustomQuery(String keyword);
// 分页查询
Page<ArticleDocument> findByTitleContaining(String title, Pageable pageable);
}
5.2 复杂查询服务
java
scss
@Service
public class ArticleSearchService {
private final ElasticsearchRestTemplate elasticsearchTemplate;
public ArticleSearchService(ElasticsearchRestTemplate elasticsearchTemplate) {
this.elasticsearchTemplate = elasticsearchTemplate;
}
public SearchHits<ArticleDocument> advancedSearch(ArticleSearchRequest request) {
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
// 构建布尔查询
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
if (StringUtils.hasText(request.getKeyword())) {
boolQuery.must(QueryBuilders.multiMatchQuery(request.getKeyword(), "title", "content")
.analyzer("ik_smart"));
}
if (request.getAuthorId() != null) {
boolQuery.filter(QueryBuilders.termQuery("authorId", request.getAuthorId()));
}
if (request.getStartTime() != null && request.getEndTime() != null) {
boolQuery.filter(QueryBuilders.rangeQuery("createTime")
.gte(request.getStartTime())
.lte(request.getEndTime()));
}
queryBuilder.withQuery(boolQuery);
// 高亮显示
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title").preTags("<em>").postTags("</em>");
highlightBuilder.field("content").preTags("<em>").postTags("</em>");
queryBuilder.withHighlightBuilder(highlightBuilder);
// 分页和排序
queryBuilder.withPageable(PageRequest.of(request.getPage(), request.getSize()));
queryBuilder.withSort(SortBuilders.scoreSort());
NativeSearchQuery searchQuery = queryBuilder.build();
return elasticsearchTemplate.search(searchQuery, ArticleDocument.class);
}
}
6. 专业名词解释
6.1 核心概念
索引 (Index)
- 类似于 MySQL 中的数据库,是文档的集合
- 每个索引可以有多个类型(Type),但在 7.x 以后版本中逐渐废弃
文档 (Document)
- 索引中的基本单位,相当于 MySQL 中的一行记录
- 使用 JSON 格式存储
分片 (Shard)
- 索引的分区,用于水平分割数据
- 主分片 (Primary Shard) :数据的主要存储
- 副本分片 (Replica Shard) :主分片的备份,提供高可用性
映射 (Mapping)
- 定义文档中字段的数据类型和属性
- 类似于 MySQL 中的表结构定义
6.2 查询类型
Match Query
- 对文本进行分词后匹配
- 支持模糊匹配和相关性评分
Term Query
- 精确匹配,不对查询词进行分词
- 常用于 keyword 类型字段
Bool Query
- 组合多个查询条件的布尔查询
- 支持 must(必须满足)、should(应该满足)、must_not(必须不满足)
7. 性能优化建议
7.1 索引优化
java
kotlin
@Configuration
public class ElasticsearchConfig {
@Value("${app.elasticsearch.index-settings.number-of-shards:3}")
private int numberOfShards;
@Value("${app.elasticsearch.index-settings.number-of-replicas:1}")
private int numberOfReplicas;
@Bean
public ElasticsearchRestTemplate elasticsearchTemplate(ElasticsearchRestTemplate restTemplate) {
// 创建索引时应用配置
return restTemplate;
}
}
7.2 查询优化
java
typescript
@Service
public class OptimizedSearchService {
public SearchHits<ArticleDocument> optimizedSearch(String keyword) {
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.multiMatchQuery(keyword, "title^2", "content")
.analyzer("ik_smart")
.fuzziness("AUTO"))
.withPageable(PageRequest.of(0, 20))
.withSourceFilter(new FetchSourceFilter(new String[]{"id", "title", "authorName"}, null))
.build();
query.setTrackTotalHits(false); // 对于大量数据,避免精确计数
return elasticsearchTemplate.search(query, ArticleDocument.class);
}
}
8. 监控与维护
8.1 健康检查
java
java
@Component
public class ElasticsearchHealthCheck {
private final ElasticsearchRestTemplate elasticsearchTemplate;
public ElasticsearchHealthCheck(ElasticsearchRestTemplate elasticsearchTemplate) {
this.elasticsearchTemplate = elasticsearchTemplate;
}
public boolean isClusterHealthy() {
try {
ClusterHealth clusterHealth = elasticsearchTemplate.execute(client ->
client.cluster().health(RequestOptions.DEFAULT));
return clusterHealth.getStatus() != ClusterHealthStatus.RED;
} catch (Exception e) {
return false;
}
}
}
8.2 指标监控
java
kotlin
@RestController
public class MetricsController {
private final MeterRegistry meterRegistry;
private final Counter searchRequestCounter;
public MetricsController(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.searchRequestCounter = Counter.builder("es.search.requests")
.description("Number of search requests")
.register(meterRegistry);
}
@PostMapping("/api/search")
public ResponseEntity<SearchResult> search(@RequestBody SearchRequest request) {
searchRequestCounter.increment();
// 搜索逻辑
}
}
9. 总结
通过本文的介绍,我们了解了如何在 Spring Boot + MySQL 的技术栈中集成 ElasticSearch,实现了:
- 数据双写:保证 MySQL 和 ElasticSearch 数据一致性
- 高效搜索:利用 ElasticSearch 的全文搜索能力
- 复杂查询:支持多条件组合和相关性排序
- 系统监控:确保搜索服务的稳定性
这种架构模式既保留了关系型数据库的事务特性,又获得了搜索引擎的高性能查询能力,是构建现代 Web 应用的理想选择。