ELK 从入门到精通:Spring Boot 实战三部曲(三)------ 高级应用与架构设计
专题导读:本系列共三篇,带你系统掌握 ELK 在 Spring Boot 项目中的实战应用。
- 第一篇 基础核心与快速上手
- 第二篇 进阶特性与性能优化
- 第三篇:高级应用与架构设计(本文)
📖 前言
经过前两篇文章的学习,我们已经掌握了 ELK 的基础操作和性能优化技巧。本文将深入探讨大规模 ELK 集群架构设计、Filebeat 数据采集、APM 应用性能监控等高级主题,助你构建企业级日志分析平台。
学完本文你将掌握:
- ✅ 大规模 ELK 集群架构设计
- ✅ Filebeat 轻量级数据采集
- ✅ APM 应用性能监控与链路追踪
- ✅ 业务数据分析实战案例
- ✅ 云原生 ELK 部署方案
- ✅ 生产环境最佳实践
一、大规模集群架构设计
1.1 架构演进路线
阶段1:单机版
┌──────────┐
│ ES │ ← 所有角色
└──────────┘
阶段2:小集群
┌────────┐ ┌────────┐ ┌────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │ ← Master + Data
└────────┘ └────────┘ └────────┘
阶段3:角色分离
┌────────┐ ┌────────┐ ┌────────┐
│Master 1│ │Master 2│ │Master 3│ ← 专用 Master
└────────┘ └────────┘ └────────┘
┌────────┐ ┌────────┐ ┌────────┐
│ Data 1 │ │ Data 2 │ │ Data 3 │ ← 专用 Data
└────────┘ └────────┘ └────────┘
┌──────────────┐
│ Coordinating │ ← 协调节点
└──────────────┘
阶段4:超大规模
┌─────────────────────────┐
│ Load Balancer │
└──────────┬──────────────┘
│
┌──────┼──────┐
▼ ▼ ▼
┌──────┐┌──────┐┌──────┐
│Coord1││Coord2││Coord3│
└──┬───┘└──┬───┘└──┬───┘
│ │ │
└───────┼───────┘
│
┌──────┼──────┐
▼ ▼ ▼
┌────────┐┌────────┐┌────────┐
│Hot ││Warm ││Cold │ ← 冷热分离
│Nodes ││Nodes ││Nodes │
└────────┘└────────┘└────────┘
1.2 节点角色规划
docker-compose.yml(多节点集群):
yaml
version: '3.8'
services:
# Master 节点
es-master1:
image: elasticsearch:8.11.0
environment:
- node.name=es-master1
- cluster.name=es-cluster
- discovery.seed_hosts=es-master2,es-master3
- cluster.initial_master_nodes=es-master1,es-master2,es-master3
- node.roles=master
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
ports:
- "9201:9200"
networks:
- elk-network
es-master2:
image: elasticsearch:8.11.0
environment:
- node.name=es-master2
- cluster.name=es-cluster
- discovery.seed_hosts=es-master1,es-master3
- cluster.initial_master_nodes=es-master1,es-master2,es-master3
- node.roles=master
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
networks:
- elk-network
es-master3:
image: elasticsearch:8.11.0
environment:
- node.name=es-master3
- cluster.name=es-cluster
- discovery.seed_hosts=es-master1,es-master2
- cluster.initial_master_nodes=es-master1,es-master2,es-master3
- node.roles=master
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
networks:
- elk-network
# Data 节点(热数据)
es-data-hot1:
image: elasticsearch:8.11.0
environment:
- node.name=es-data-hot1
- cluster.name=es-cluster
- discovery.seed_hosts=es-master1,es-master2,es-master3
- node.roles=data_hot,data_content
- "ES_JAVA_OPTS=-Xms4g -Xmx4g"
volumes:
- es-data-hot1:/usr/share/elasticsearch/data
networks:
- elk-network
es-data-hot2:
image: elasticsearch:8.11.0
environment:
- node.name=es-data-hot2
- cluster.name=es-cluster
- discovery.seed_hosts=es-master1,es-master2,es-master3
- node.roles=data_hot,data_content
- "ES_JAVA_OPTS=-Xms4g -Xmx4g"
volumes:
- es-data-hot2:/usr/share/elasticsearch/data
networks:
- elk-network
# Data 节点(温数据)
es-data-warm1:
image: elasticsearch:8.11.0
environment:
- node.name=es-data-warm1
- cluster.name=es-cluster
- discovery.seed_hosts=es-master1,es-master2,es-master3
- node.roles=data_warm
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
volumes:
- es-data-warm1:/usr/share/elasticsearch/data
networks:
- elk-network
# Coordinating 节点
es-coord1:
image: elasticsearch:8.11.0
environment:
- node.name=es-coord1
- cluster.name=es-cluster
- discovery.seed_hosts=es-master1,es-master2,es-master3
- node.roles=ingest,remote_cluster_client
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
ports:
- "9200:9200"
networks:
- elk-network
volumes:
es-data-hot1:
es-data-hot2:
es-data-warm1:
networks:
elk-network:
driver: bridge
1.3 索引生命周期管理(ILM)
冷热分离策略:
json
// 创建 ILM 策略
PUT _ilm/policy/logs_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "1d",
"max_primary_shard_size": "50gb"
},
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
},
"set_priority": {
"priority": 50
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {},
"set_priority": {
"priority": 0
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
// 创建索引模板并关联 ILM
PUT _index_template/logs_template
{
"index_patterns": ["app-logs-*"],
"template": {
"settings": {
"index.lifecycle.name": "logs_policy",
"index.lifecycle.rollover_alias": "app-logs"
}
}
}
// 创建初始索引
PUT app-logs-000001
{
"aliases": {
"app-logs": {
"is_write_index": true
}
}
}
二、Filebeat 轻量级数据采集
2.1 Filebeat 安装与配置
docker-compose.yml:
yaml
services:
filebeat:
image: elastic/filebeat:8.11.0
container_name: filebeat
user: root
volumes:
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./logs:/var/log/app:ro
command: ["-e", "-strict.perms=false"]
networks:
- elk-network
filebeat.yml:
yaml
filebeat.inputs:
# 采集应用日志
- type: log
enabled: true
paths:
- /var/log/app/*.log
fields:
app: my-application
environment: production
multiline.pattern: '^\d{4}-\d{2}-\d{2}'
multiline.negate: true
multiline.match: after
json.keys_under_root: true
json.overwrite_keys: true
# 采集 Docker 容器日志
- type: container
paths:
- /var/lib/docker/containers/*/*.log
processors:
- add_docker_metadata: ~
# 输出到 Logstash
output.logstash:
hosts: ["logstash:5044"]
# 或者直接输出到 Elasticsearch
# output.elasticsearch:
# hosts: ["http://elasticsearch:9200"]
# indices:
# - index: "app-logs-%{+yyyy.MM.dd}"
# 处理器
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
- decode_json_fields:
fields: ["message"]
target: ""
overwrite_keys: true
2.2 多行日志处理
Java 异常堆栈合并:
yaml
filebeat.inputs:
- type: log
paths:
- /var/log/app/*.log
# 多行匹配:以日期开头的为新日志
multiline.type: pattern
multiline.pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
multiline.negate: true
multiline.match: after
multiline.max_lines: 500
multiline.timeout: 10s
2.3 Module 快速采集
启用 Nginx Module:
bash
# 列出可用模块
filebeat modules list
# 启用 Nginx 模块
filebeat modules enable nginx
# 配置模块
vim modules.d/nginx.yml
yaml
- module: nginx
access:
enabled: true
var.paths: ["/var/log/nginx/access.log*"]
error:
enabled: true
var.paths: ["/var/log/nginx/error.log*"]
三、APM 应用性能监控
3.1 APM Server 部署
docker-compose.yml:
yaml
services:
apm-server:
image: docker.elastic.co/apm/apm-server:8.11.0
container_name: apm-server
ports:
- "8200:8200"
volumes:
- ./apm-server.yml:/usr/share/apm-server/apm-server.yml:ro
depends_on:
- elasticsearch
- kibana
networks:
- elk-network
apm-server.yml:
yaml
apm-server:
host: "0.0.0.0:8200"
output.elasticsearch:
hosts: ["http://elasticsearch:9200"]
kibana:
enabled: true
host: "http://kibana:5601"
3.2 Spring Boot 集成 APM
添加依赖:
xml
<dependency>
<groupId>co.elastic.apm</groupId>
<artifactId>elastic-apm-agent</artifactId>
<version>1.43.0</version>
</dependency>
方式一:Java Agent
bash
java -javaagent:/path/to/elastic-apm-agent.jar \
-Delastic.apm.service_name=my-application \
-Delastic.apm.server_urls=http://localhost:8200 \
-Delastic.apm.environment=production \
-jar my-app.jar
方式二:Maven 插件
xml
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<jvmArguments>
-javaagent:${user.home}/.m2/repository/co/elastic/apm/elastic-apm-agent/1.43.0/elastic-apm-agent-1.43.0.jar
-Delastic.apm.service_name=my-application
-Delastic.apm.server_urls=http://localhost:8200
</jvmArguments>
</configuration>
</plugin>
方式三:配置文件
src/main/resources/elasticapm.properties:
properties
service_name=my-application
server_urls=http://localhost:8200
environment=production
application_packages=com.example
log_level=INFO
transaction_sample_rate=1.0
3.3 自定义追踪
java
@Service
@Slf4j
public class OrderService {
@Autowired
private OrderMapper orderMapper;
/**
* APM 自动追踪该方法
*/
@Transactional
public Order createOrder(OrderRequest request) {
// 创建订单逻辑
Order order = new Order();
order.setUserId(request.getUserId());
order.setAmount(request.getAmount());
orderMapper.insert(order);
// 发送消息
sendMessage(order);
return order;
}
/**
* 自定义 Span
*/
@SentryTransaction(operation = "send_message")
private void sendMessage(Order order) {
ElasticApm.currentSpan()
.addLabel("order_id", order.getId())
.addLabel("user_id", order.getUserId());
// 发送消息逻辑
log.info("发送订单消息: orderId={}", order.getId());
}
}
3.4 查看 APM 数据
- 进入 Kibana → APM
- 选择服务:
my-application - 查看指标:
- 吞吐量(Transactions per minute)
- 响应时间(Response time)
- 错误率(Error rate)
- 查看事务详情:
- 调用链路
- 数据库查询
- 外部 HTTP 调用
- 性能瓶颈
四、业务数据分析实战
4.1 电商数据分析
用户行为分析
java
@Data
@Document(indexName = "user-behavior")
public class UserBehaviorDocument {
@Id
private String id;
@Field(type = FieldType.Keyword)
private String userId;
@Field(type = FieldType.Keyword)
private String eventType; // view, click, add_cart, purchase
@Field(type = FieldType.Keyword)
private String productId;
@Field(type = FieldType.Keyword)
private String productName;
@Field(type = FieldType.Double)
private Double price;
@Field(type = FieldType.Date, format = DateFormat.date_time)
private LocalDateTime timestamp;
@Field(type = FieldType.Keyword)
private String sessionId;
@Field(type = FieldType.Keyword)
private String deviceType; // mobile, desktop, tablet
}
@Service
public class UserBehaviorAnalysisService {
@Autowired
private ElasticsearchRestTemplate elasticsearchTemplate;
/**
* 漏斗分析:浏览 → 加购 → 购买
*/
public Map<String, Long> funnelAnalysis(LocalDateTime startTime, LocalDateTime endTime) {
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
.filter(QueryBuilders.rangeQuery("timestamp")
.gte(startTime.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME))
.lte(endTime.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME)));
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(boolQuery)
.addAggregation(AggregationBuilders.terms("event_funnel")
.field("eventType")
.size(10))
.build();
SearchHits<UserBehaviorDocument> hits = elasticsearchTemplate.search(
query, UserBehaviorDocument.class
);
Terms eventFunnel = hits.getAggregations().get("event_funnel");
Map<String, Long> result = new HashMap<>();
for (Terms.Bucket bucket : eventFunnel.getBuckets()) {
result.put(bucket.getKeyAsString(), bucket.getDocCount());
}
return result;
}
/**
* 用户留存分析
*/
public Map<String, Object> retentionAnalysis(int days) {
// 计算每日活跃用户
DateHistogramAggregationBuilder dailyActive = AggregationBuilders
.dateHistogram("daily_active")
.field("timestamp")
.calendarInterval(DateHistogramInterval.DAY)
.subAggregation(
AggregationBuilders.cardinality("unique_users")
.field("userId")
);
NativeSearchQuery query = new NativeSearchQueryBuilder()
.addAggregation(dailyActive)
.build();
SearchHits<UserBehaviorDocument> hits = elasticsearchTemplate.search(
query, UserBehaviorDocument.class
);
Histogram histogram = hits.getAggregations().get("daily_active");
List<Map<String, Object>> dailyStats = new ArrayList<>();
for (Histogram.Bucket bucket : histogram.getBuckets()) {
Map<String, Object> dayData = new HashMap<>();
dayData.put("date", bucket.getKeyAsString());
Cardinality uniqueUsers = bucket.getAggregations().get("unique_users");
dayData.put("activeUsers", uniqueUsers.getValue());
dailyStats.add(dayData);
}
Map<String, Object> result = new HashMap<>();
result.put("dailyStats", dailyStats);
return result;
}
/**
* 商品热度排行
*/
public List<Map<String, Object>> productRanking(LocalDateTime startTime, LocalDateTime endTime) {
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
.filter(QueryBuilders.termQuery("eventType", "purchase"))
.filter(QueryBuilders.rangeQuery("timestamp")
.gte(startTime.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME))
.lte(endTime.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME)));
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(boolQuery)
.addAggregation(AggregationBuilders.terms("top_products")
.field("productId")
.size(10)
.subAggregation(AggregationBuilders.sum("total_sales")
.field("price"))
.subAggregation(AggregationBuilders.min("first_seen")
.field("timestamp")))
.build();
SearchHits<UserBehaviorDocument> hits = elasticsearchTemplate.search(
query, UserBehaviorDocument.class
);
Terms topProducts = hits.getAggregations().get("top_products");
List<Map<String, Object>> ranking = new ArrayList<>();
for (Terms.Bucket bucket : topProducts.getBuckets()) {
Map<String, Object> productData = new HashMap<>();
productData.put("productId", bucket.getKeyAsString());
productData.put("salesCount", bucket.getDocCount());
Sum totalSales = bucket.getAggregations().get("total_sales");
productData.put("totalRevenue", totalSales.getValue());
ranking.add(productData);
}
return ranking;
}
}
4.2 实时监控大屏
Kibana Dashboard JSON:
json
{
"title": "电商实时监控大屏",
"panels": [
{
"title": "实时订单量",
"type": "metric",
"query": {
"index": "order-*",
"aggs": {
"count": { "type": "count" }
},
"timeRange": "last_1h"
}
},
{
"title": "销售额趋势",
"type": "line",
"query": {
"index": "order-*",
"aggs": {
"sum_amount": {
"type": "sum",
"field": "amount"
}
},
"timeRange": "last_24h",
"interval": "1h"
}
},
{
"title": "热门商品TOP10",
"type": "table",
"query": {
"index": "user-behavior",
"aggs": {
"top_products": {
"type": "terms",
"field": "productName",
"size": 10
}
}
}
}
]
}
五、云原生 ELK 部署
5.1 Kubernetes 部署
elasticsearch-statefulset.yaml:
yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: elasticsearch:8.11.0
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
env:
- name: discovery.seed_hosts
value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
- name: ES_JAVA_OPTS
value: "-Xms2g -Xmx2g"
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
kibana-deployment.yaml:
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
spec:
replicas: 2
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- name: kibana
image: kibana:8.11.0
env:
- name: ELASTICSEARCH_HOSTS
value: "http://elasticsearch:9200"
ports:
- containerPort: 5601
resources:
requests:
memory: "512Mi"
cpu: "500m"
5.2 Helm Chart 部署
bash
# 添加 Elastic Helm 仓库
helm repo add elastic https://helm.elastic.co
# 安装 Elasticsearch
helm install elasticsearch elastic/elasticsearch \
--set replicas=3 \
--set resources.requests.memory=2Gi \
--set resources.limits.memory=4Gi
# 安装 Kibana
helm install kibana elastic/kibana \
--set elasticsearchHosts=http://elasticsearch:9200
# 安装 Filebeat
helm install filebeat elastic/filebeat
六、最佳实践总结
6.1 设计原则
✅ 索引设计
- 合理设置分片数(建议每个分片 10-50GB)
- 使用 ILM 管理索引生命周期
- 冷热数据分离存储
✅ 查询优化
- 优先使用 filter 而非 query
- 避免深分页,使用 search_after
- 合理使用缓存
✅ 写入优化
- 批量写入,减少网络开销
- 调整 refresh_interval
- 异步 translog
✅ 资源管理
- 预留 50% 内存给文件系统缓存
- 禁用 Swap
- 使用 SSD 存储
✅ 监控告警
- 监控集群健康状态
- 设置合理的告警阈值
- 定期审查慢查询
6.2 Checklist
yaml
生产环境检查清单:
架构设计:
✓ Master/Data 角色分离
✓ 至少 3 个 Master 节点
✓ 数据节点根据负载扩展
✓ 启用 ILM 策略
性能优化:
✓ JVM Heap 不超过 32GB
✓ 分片大小控制在 10-50GB
✓ 批量写入优化
✓ 查询使用 filter 缓存
安全性:
✓ 启用 X-Pack Security
✓ 配置 SSL/TLS
✓ 设置细粒度权限
✓ 定期更换密码
备份恢复:
✓ 配置快照仓库
✓ 定时自动备份
✓ 定期测试恢复
✓ 异地备份
监控运维:
✓ 部署 Monitoring
✓ 配置告警规则
✓ 日志轮转策略
✓ 容量规划
七、总结与展望
7.1 系列文章回顾
第一篇:基础核心与快速上手
- ✅ ELK 基础概念
- ✅ Elasticsearch 基本操作
- ✅ Spring Boot 集成
- ✅ Logstash 日志收集
- ✅ Kibana 可视化
第二篇:进阶特性与性能优化
- ✅ 高级查询与聚合
- ✅ 性能优化技巧
- ✅ Logstash 高级处理
- ✅ Kibana 高级可视化
- ✅ 安全加固
- ✅ 备份恢复
第三篇:高级应用与架构设计
- ✅ 大规模集群架构
- ✅ Filebeat 数据采集
- ✅ APM 应用性能监控
- ✅ 业务数据分析实战
- ✅ 云原生部署
- ✅ 最佳实践总结
7.2 学习路线图
入门(1-2周):
├─ 理解 ELK 核心概念
├─ 搭建单机环境
└─ 完成基础集成
进阶(2-4周):
├─ 掌握高级查询
├─ 性能调优
├─ 配置 ILM
└─ 搭建监控系统
高级(持续):
├─ 设计大规模集群
├─ APM 链路追踪
├─ 业务数据分析
└─ 云原生部署
7.3 未来发展方向
- Elasticsearch 9.0:向量搜索增强、AI 集成
- Serverless:无服务器架构
- Edge Computing:边缘计算场景
- Multi-Cloud:多云部署方案
📚 参考资料
- Elasticsearch 官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
- Elastic Stack 最佳实践:https://www.elastic.co/best-practices
- 《Elasticsearch 权威指南》
🎉 恭喜!你已经完成了 ELK 从入门到精通的完整学习之旅!
觉得有用?欢迎点赞、收藏、转发!
有任何问题,欢迎在评论区交流! 💬
系列文章:
- 第一篇 ELK 从入门到精通:Spring Boot 实战三部曲(一)------ 基础核心与快速上手
- 第二篇 ELK 从入门到精通:Spring Boot 实战三部曲(二)------ 进阶特性与性能优化
- 第三篇 ELK 从入门到精通:Spring Boot 实战三部曲(三)------ 高级应用与架构设计