学习目标
掌握微服务监控与运维的完整技术栈,学习Spring Boot Actuator健康检查,掌握ELK日志分析系统,了解Prometheus指标监控,掌握Grafana可视化设计,学习APM性能监控和DevOps自动化实践。
1. 微服务监控基础
1.1 Spring Boot Actuator健康检查
Actuator配置:
yaml
# application.yml
management:
endpoints:
web:
exposure:
include: "*"
base-path: /actuator
endpoint:
health:
show-details: when-authorized
show-components: always
info:
enabled: true
metrics:
enabled: true
prometheus:
enabled: true
health:
defaults:
enabled: true
redis:
enabled: true
db:
enabled: true
diskspace:
enabled: true
threshold: 100MB
metrics:
export:
prometheus:
enabled: true
distribution:
percentiles-histogram:
http.server.requests: true
percentiles:
http.server.requests: 0.5, 0.95, 0.99
slo:
http.server.requests: 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s
info:
env:
enabled: true
java:
enabled: true
os:
enabled: true
build:
enabled: true
自定义健康检查:
java
// 数据库健康检查
@Component
@Slf4j
public class DatabaseHealthIndicator implements HealthIndicator {
@Autowired
private DataSource dataSource;
@Override
public Health health() {
try (Connection connection = dataSource.getConnection()) {
if (connection.isValid(1)) {
return Health.up()
.withDetail("database", "MySQL")
.withDetail("validationQuery", "SELECT 1")
.withDetail("responseTime", System.currentTimeMillis())
.build();
}
} catch (SQLException e) {
log.error("数据库健康检查失败", e);
return Health.down()
.withDetail("database", "MySQL")
.withDetail("error", e.getMessage())
.build();
}
return Health.down()
.withDetail("database", "MySQL")
.withDetail("error", "连接无效")
.build();
}
}
// Redis健康检查
@Component
@Slf4j
public class RedisHealthIndicator implements HealthIndicator {
@Autowired
private RedisTemplate<String, String> redisTemplate;
@Override
public Health health() {
try {
String pong = redisTemplate.getConnectionFactory()
.getConnection()
.ping();
if ("PONG".equals(pong)) {
return Health.up()
.withDetail("redis", "Redis")
.withDetail("response", pong)
.withDetail("responseTime", System.currentTimeMillis())
.build();
}
} catch (Exception e) {
log.error("Redis健康检查失败", e);
return Health.down()
.withDetail("redis", "Redis")
.withDetail("error", e.getMessage())
.build();
}
return Health.down()
.withDetail("redis", "Redis")
.withDetail("error", "连接失败")
.build();
}
}
// 业务健康检查
@Component
@Slf4j
public class BusinessHealthIndicator implements HealthIndicator {
@Autowired
private UserService userService;
@Autowired
private OrderService orderService;
@Override
public Health health() {
try {
// 检查用户服务
long userCount = userService.getUserCount();
if (userCount < 0) {
return Health.down()
.withDetail("userService", "用户服务异常")
.withDetail("userCount", userCount)
.build();
}
// 检查订单服务
long orderCount = orderService.getOrderCount();
if (orderCount < 0) {
return Health.down()
.withDetail("orderService", "订单服务异常")
.withDetail("orderCount", orderCount)
.build();
}
return Health.up()
.withDetail("userService", "正常")
.withDetail("userCount", userCount)
.withDetail("orderService", "正常")
.withDetail("orderCount", orderCount)
.build();
} catch (Exception e) {
log.error("业务健康检查失败", e);
return Health.down()
.withDetail("error", e.getMessage())
.build();
}
}
}
1.2 自定义指标监控
业务指标监控:
java
// 自定义指标配置
@Configuration
@EnableMetrics
@Slf4j
public class MetricsConfig {
@Bean
public MeterRegistry meterRegistry() {
return new SimpleMeterRegistry();
}
@Bean
public TimedAspect timedAspect(MeterRegistry registry) {
return new TimedAspect(registry);
}
@Bean
public CountedAspect countedAspect(MeterRegistry registry) {
return new CountedAspect(registry);
}
}
// 业务指标服务
@Service
@Slf4j
public class BusinessMetricsService {
private final Counter userRegistrationCounter;
private final Counter orderCreationCounter;
private final Timer orderProcessingTimer;
private final Gauge activeUserGauge;
private final DistributionSummary orderAmountSummary;
public BusinessMetricsService(MeterRegistry meterRegistry) {
this.userRegistrationCounter = Counter.builder("user.registration.total")
.description("用户注册总数")
.tag("type", "registration")
.register(meterRegistry);
this.orderCreationCounter = Counter.builder("order.creation.total")
.description("订单创建总数")
.tag("type", "creation")
.register(meterRegistry);
this.orderProcessingTimer = Timer.builder("order.processing.duration")
.description("订单处理时间")
.register(meterRegistry);
this.activeUserGauge = Gauge.builder("user.active.count")
.description("活跃用户数")
.register(meterRegistry, this, BusinessMetricsService::getActiveUserCount);
this.orderAmountSummary = DistributionSummary.builder("order.amount.distribution")
.description("订单金额分布")
.baseUnit("CNY")
.register(meterRegistry);
}
public void incrementUserRegistration() {
userRegistrationCounter.increment();
log.info("用户注册指标更新");
}
public void incrementOrderCreation() {
orderCreationCounter.increment();
log.info("订单创建指标更新");
}
public void recordOrderProcessingTime(Duration duration) {
orderProcessingTimer.record(duration);
log.info("订单处理时间记录: {}ms", duration.toMillis());
}
public void recordOrderAmount(BigDecimal amount) {
orderAmountSummary.record(amount.doubleValue());
log.info("订单金额记录: {}", amount);
}
private double getActiveUserCount() {
// 实际业务逻辑:获取活跃用户数
return 1000.0; // 示例值
}
}
// 指标监控切面
@Aspect
@Component
@Slf4j
public class MetricsAspect {
@Autowired
private BusinessMetricsService metricsService;
@Around("@annotation(org.springframework.web.bind.annotation.PostMapping) && execution(* com.example.service.*.create*(..))")
public Object monitorCreateOperations(ProceedingJoinPoint joinPoint) throws Throwable {
String methodName = joinPoint.getSignature().getName();
long startTime = System.currentTimeMillis();
try {
Object result = joinPoint.proceed();
// 记录成功指标
if (methodName.contains("User")) {
metricsService.incrementUserRegistration();
} else if (methodName.contains("Order")) {
metricsService.incrementOrderCreation();
}
return result;
} catch (Exception e) {
// 记录失败指标
log.error("操作失败: {}", methodName, e);
throw e;
} finally {
long duration = System.currentTimeMillis() - startTime;
log.info("方法执行时间: {}ms", duration);
}
}
@Around("@annotation(org.springframework.web.bind.annotation.GetMapping) && execution(* com.example.service.*.get*(..))")
public Object monitorGetOperations(ProceedingJoinPoint joinPoint) throws Throwable {
String methodName = joinPoint.getSignature().getName();
long startTime = System.currentTimeMillis();
try {
return joinPoint.proceed();
} finally {
long duration = System.currentTimeMillis() - startTime;
log.info("查询方法执行时间: {}ms", duration);
}
}
}
2. ELK日志分析系统
2.1 Elasticsearch配置
Elasticsearch配置:
yaml
# elasticsearch.yml
cluster.name: microservices-cluster
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["localhost"]
cluster.initial_master_nodes: ["node-1"]
# 索引配置
index:
number_of_shards: 3
number_of_replicas: 1
refresh_interval: 1s
# 日志配置
path:
data: /usr/share/elasticsearch/data
logs: /usr/share/elasticsearch/logs
# 内存配置
bootstrap.memory_lock: true
应用日志配置:
xml
<!-- logback-spring.xml -->
<configuration>
<springProfile name="!prod">
<include resource="org/springframework/boot/logging/logback/defaults.xml"/>
<include resource="org/springframework/boot/logging/logback/console-appender.xml"/>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
</root>
</springProfile>
<springProfile name="prod">
<!-- 控制台输出 -->
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
<providers>
<timestamp/>
<logLevel/>
<loggerName/>
<mdc/>
<message/>
<stackTrace/>
</providers>
</encoder>
</appender>
<!-- 文件输出 -->
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>/app/logs/application.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>/app/logs/application.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<maxHistory>30</maxHistory>
<totalSizeCap>3GB</totalSizeCap>
</rollingPolicy>
<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
<providers>
<timestamp/>
<logLevel/>
<loggerName/>
<mdc/>
<message/>
<stackTrace/>
</providers>
</encoder>
</appender>
<!-- 异步输出 -->
<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="FILE"/>
<queueSize>1024</queueSize>
<discardingThreshold>0</discardingThreshold>
<includeCallerData>true</includeCallerData>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
<appender-ref ref="ASYNC"/>
</root>
</springProfile>
</configuration>
2.2 Logstash配置
Logstash管道配置:
ruby
# logstash.conf
input {
beats {
port => 5044
}
tcp {
port => 5000
codec => json_lines
}
file {
path => "/app/logs/*.log"
start_position => "beginning"
codec => json
}
}
filter {
if [fields][service] == "user-service" {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] %{LOGLEVEL:level} \[%{DATA:traceId},%{DATA:spanId}\] %{DATA:logger} - %{GREEDYDATA:message}" }
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
}
mutate {
add_field => { "service_name" => "user-service" }
add_field => { "environment" => "production" }
}
}
if [fields][service] == "order-service" {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] %{LOGLEVEL:level} \[%{DATA:traceId},%{DATA:spanId}\] %{DATA:logger} - %{GREEDYDATA:message}" }
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
}
mutate {
add_field => { "service_name" => "order-service" }
add_field => { "environment" => "production" }
}
}
# 错误日志特殊处理
if [level] == "ERROR" {
mutate {
add_field => { "alert_level" => "high" }
add_field => { "notification_required" => "true" }
}
}
# 性能日志处理
if [message] =~ /执行时间/ {
grok {
match => { "message" => "方法执行时间: %{NUMBER:execution_time}ms" }
}
mutate {
convert => { "execution_time" => "integer" }
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "microservices-logs-%{+YYYY.MM.dd}"
template_name => "microservices-logs"
template => "/usr/share/logstash/templates/microservices-logs.json"
template_overwrite => true
}
# 错误日志单独索引
if [level] == "ERROR" {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "error-logs-%{+YYYY.MM.dd}"
}
}
# 性能日志单独索引
if [execution_time] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "performance-logs-%{+YYYY.MM.dd}"
}
}
}
2.3 Kibana可视化配置
Kibana索引模式:
json
{
"index_patterns": ["microservices-logs-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.refresh_interval": "1s"
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"level": {
"type": "keyword"
},
"service_name": {
"type": "keyword"
},
"environment": {
"type": "keyword"
},
"traceId": {
"type": "keyword"
},
"spanId": {
"type": "keyword"
},
"message": {
"type": "text",
"analyzer": "standard"
},
"execution_time": {
"type": "integer"
},
"alert_level": {
"type": "keyword"
},
"notification_required": {
"type": "boolean"
}
}
}
}
Kibana仪表盘配置:
json
{
"version": "8.0.0",
"kibana": {
"version": "8.0.0"
},
"saved_objects": [
{
"id": "microservices-dashboard",
"type": "dashboard",
"attributes": {
"title": "微服务监控仪表盘",
"panelsJSON": "[{\"version\":\"8.0.0\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"1\"},\"panelIndex\":\"1\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"错误日志趋势\",\"visState\":\"{\\\"title\\\":\\\"错误日志趋势\\\",\\\"type\\\":\\\"histogram\\\",\\\"params\\\":{\\\"grid\\\":{\\\"categoryLines\\\":false,\\\"style\\\":{\\\"color\\\":\"#eee\\\"}},\\\"categoryAxes\\\":[{\\\"id\\\":\\\"CategoryAxis-1\\\",\\\"type\\\":\\\"category\\\",\\\"position\\\":\\\"bottom\\\",\\\"show\\\":true,\\\"style\\\":{},\\\"scale\\\":{\\\"type\\\":\\\"linear\\\"},\\\"labels\\\":{\\\"show\\\":true,\\\"truncate\\\":100},\\\"title\\\":{}}],\\\"valueAxes\\\":[{\\\"id\\\":\\\"ValueAxis-1\\\",\\\"name\\\":\\\"LeftAxis-1\\\",\\\"type\\\":\\\"value\\\",\\\"position\\\":\\\"left\\\",\\\"show\\\":true,\\\"style\\\":{},\\\"scale\\\":{\\\"type\\\":\\\"linear\\\",\\\"mode\\\":\\\"normal\\\"},\\\"labels\\\":{\\\"show\\\":true,\\\"rotate\\\":0,\\\"filter\\\":false,\\\"truncate\\\":100},\\\"title\\\":{\\\"text\\\":\\\"Count\\\"}}],\\\"seriesParams\\\":[{\\\"data\\\":{\\\"id\\\":\\\"1\\\",\\\"label\\\":\\\"Count\\\"},\\\"type\\\":\\\"histogram\\\",\\\"mode\\\":\\\"stacked\\\",\\\"show\\\":true,\\\"valueAxis\\\":\\\"ValueAxis-1\\\",\\\"drawLinesBetweenPoints\\\":true,\\\"showCircles\\\":true}],\\\"addTooltip\\\":true,\\\"addLegend\\\":true,\\\"legendPosition\\\":\\\"right\\\",\\\"times\\\":[],\\\"addTimeMarker\\\":false},\\\"aggs\\\":[{\\\"id\\\":\\\"1\\\",\\\"type\\\":\\\"count\\\",\\\"schema\\\":\\\"metric\\\",\\\"params\\\":{}},{\\\"id\\\":\\\"2\\\",\\\"type\\\":\\\"date_histogram\\\",\\\"schema\\\":\\\"segment\\\",\\\"params\\\":{\\\"field\\\":\\\"@timestamp\\\",\\\"interval\\\":\\\"auto\\\",\\\"customInterval\\\":\\\"2h\\\",\\\"min_doc_count\\\":1,\\\"extended_bounds\\\":{}}},{\\\"id\\\":\\\"3\\\",\\\"type\\\":\\\"filters\\\",\\\"schema\\\":\\\"group\\\",\\\"params\\\":{\\\"filters\\\":[{\\\"input\\\":{\\\"query\\\":{\\\"query_string\\\":{\\\"query\\\":\\\"level:ERROR\\\",\\\"analyze_wildcard\\\":true}}},\\\"label\\\":\\\"Error\\\"}]}}]}\"}}}]"
}
}
]
}
3. Prometheus指标监控
3.1 Prometheus配置
Prometheus配置:
yaml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'user-service'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['user-service:8081']
scrape_interval: 5s
- job_name: 'order-service'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['order-service:8082']
scrape_interval: 5s
- job_name: 'api-gateway'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['api-gateway:8080']
scrape_interval: 5s
- job_name: 'mysql'
static_configs:
- targets: ['mysql-exporter:9104']
- job_name: 'redis'
static_configs:
- targets: ['redis-exporter:9121']
告警规则配置:
yaml
# alert_rules.yml
groups:
- name: microservices
rules:
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.1
for: 2m
labels:
severity: critical
annotations:
summary: "高错误率告警"
description: "服务 {{ $labels.instance }} 错误率超过10%,当前值: {{ $value }}"
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m])) > 1
for: 2m
labels:
severity: warning
annotations:
summary: "高响应时间告警"
description: "服务 {{ $labels.instance }} 95%响应时间超过1秒,当前值: {{ $value }}s"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "服务宕机告警"
description: "服务 {{ $labels.instance }} 已宕机"
- alert: HighMemoryUsage
expr: (jvm_memory_used_bytes / jvm_memory_max_bytes) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "高内存使用率告警"
description: "服务 {{ $labels.instance }} 内存使用率超过80%,当前值: {{ $value }}"
- alert: HighCPUUsage
expr: system_cpu_usage > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "高CPU使用率告警"
description: "服务 {{ $labels.instance }} CPU使用率超过80%,当前值: {{ $value }}"
3.2 自定义指标收集
自定义指标收集器:
java
// 自定义指标收集器
@Component
@Slf4j
public class CustomMetricsCollector {
private final MeterRegistry meterRegistry;
private final Counter businessOperationCounter;
private final Timer businessOperationTimer;
private final Gauge businessQueueSize;
private final DistributionSummary businessAmountSummary;
public CustomMetricsCollector(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.businessOperationCounter = Counter.builder("business.operation.total")
.description("业务操作总数")
.tag("type", "operation")
.register(meterRegistry);
this.businessOperationTimer = Timer.builder("business.operation.duration")
.description("业务操作耗时")
.register(meterRegistry);
this.businessQueueSize = Gauge.builder("business.queue.size")
.description("业务队列大小")
.register(meterRegistry, this, CustomMetricsCollector::getQueueSize);
this.businessAmountSummary = DistributionSummary.builder("business.amount.distribution")
.description("业务金额分布")
.baseUnit("CNY")
.register(meterRegistry);
}
public void recordBusinessOperation(String operation, Duration duration) {
businessOperationCounter.increment(
Tags.of("operation", operation)
);
businessOperationTimer.record(duration);
log.info("业务操作记录: {} 耗时: {}ms", operation, duration.toMillis());
}
public void recordBusinessAmount(BigDecimal amount) {
businessAmountSummary.record(amount.doubleValue());
log.info("业务金额记录: {}", amount);
}
private double getQueueSize() {
// 实际业务逻辑:获取队列大小
return 100.0; // 示例值
}
}
// 指标收集切面
@Aspect
@Component
@Slf4j
public class MetricsCollectionAspect {
@Autowired
private CustomMetricsCollector metricsCollector;
@Around("@annotation(org.springframework.web.bind.annotation.PostMapping)")
public Object collectPostMetrics(ProceedingJoinPoint joinPoint) throws Throwable {
String methodName = joinPoint.getSignature().getName();
long startTime = System.currentTimeMillis();
try {
Object result = joinPoint.proceed();
Duration duration = Duration.ofMillis(System.currentTimeMillis() - startTime);
metricsCollector.recordBusinessOperation("POST_" + methodName, duration);
return result;
} catch (Exception e) {
Duration duration = Duration.ofMillis(System.currentTimeMillis() - startTime);
metricsCollector.recordBusinessOperation("POST_" + methodName + "_ERROR", duration);
throw e;
}
}
@Around("@annotation(org.springframework.web.bind.annotation.GetMapping)")
public Object collectGetMetrics(ProceedingJoinPoint joinPoint) throws Throwable {
String methodName = joinPoint.getSignature().getName();
long startTime = System.currentTimeMillis();
try {
Object result = joinPoint.proceed();
Duration duration = Duration.ofMillis(System.currentTimeMillis() - startTime);
metricsCollector.recordBusinessOperation("GET_" + methodName, duration);
return result;
} catch (Exception e) {
Duration duration = Duration.ofMillis(System.currentTimeMillis() - startTime);
metricsCollector.recordBusinessOperation("GET_" + methodName + "_ERROR", duration);
throw e;
}
}
}
4. Grafana可视化设计
4.1 Grafana仪表盘配置
Grafana仪表盘JSON:
json
{
"dashboard": {
"id": null,
"title": "微服务监控仪表盘",
"tags": ["microservices", "monitoring"],
"style": "dark",
"timezone": "browser",
"refresh": "5s",
"time": {
"from": "now-1h",
"to": "now"
},
"panels": [
{
"id": 1,
"title": "服务状态",
"type": "stat",
"targets": [
{
"expr": "up",
"legendFormat": "{{instance}}"
}
],
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"thresholds": {
"steps": [
{
"color": "red",
"value": 0
},
{
"color": "green",
"value": 1
}
]
}
}
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
}
},
{
"id": 2,
"title": "请求率",
"type": "graph",
"targets": [
{
"expr": "rate(http_server_requests_seconds_count[5m])",
"legendFormat": "{{instance}} - {{method}} {{uri}}"
}
],
"yAxes": [
{
"label": "请求/秒",
"min": 0
}
],
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
}
},
{
"id": 3,
"title": "响应时间",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m]))",
"legendFormat": "95%响应时间"
},
{
"expr": "histogram_quantile(0.50, rate(http_server_requests_seconds_bucket[5m]))",
"legendFormat": "50%响应时间"
}
],
"yAxes": [
{
"label": "秒",
"min": 0
}
],
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
}
},
{
"id": 4,
"title": "错误率",
"type": "graph",
"targets": [
{
"expr": "rate(http_server_requests_seconds_count{status=~\"5..\"}[5m])",
"legendFormat": "5xx错误率"
},
{
"expr": "rate(http_server_requests_seconds_count{status=~\"4..\"}[5m])",
"legendFormat": "4xx错误率"
}
],
"yAxes": [
{
"label": "错误/秒",
"min": 0
}
],
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
}
},
{
"id": 5,
"title": "JVM内存使用",
"type": "graph",
"targets": [
{
"expr": "jvm_memory_used_bytes{area=\"heap\"}",
"legendFormat": "堆内存使用"
},
{
"expr": "jvm_memory_max_bytes{area=\"heap\"}",
"legendFormat": "堆内存最大值"
}
],
"yAxes": [
{
"label": "字节",
"min": 0
}
],
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 16
}
},
{
"id": 6,
"title": "业务指标",
"type": "graph",
"targets": [
{
"expr": "rate(business_operation_total[5m])",
"legendFormat": "业务操作率"
},
{
"expr": "business_queue_size",
"legendFormat": "队列大小"
}
],
"yAxes": [
{
"label": "操作/秒",
"min": 0
}
],
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 16
}
}
]
}
}
4.2 告警配置
Grafana告警配置:
json
{
"alert": {
"id": 1,
"name": "高错误率告警",
"message": "服务错误率超过阈值",
"frequency": "10s",
"handler": 1,
"enabled": true,
"executionError": "alerting",
"for": "2m",
"conditions": [
{
"evaluator": {
"params": [0.1],
"type": "gt"
},
"operator": {
"type": "and"
},
"query": {
"params": ["A", "5m", "now"]
},
"reducer": {
"params": [],
"type": "last"
},
"type": "query"
}
],
"settings": {
"noDataState": "no_data",
"executionErrorState": "alerting"
}
}
}
5. APM性能监控
5.1 应用性能监控
APM配置:
java
// APM配置类
@Configuration
@EnableAspectJAutoProxy
@Slf4j
public class APMConfig {
@Bean
public MeterRegistry meterRegistry() {
return new SimpleMeterRegistry();
}
@Bean
public TimedAspect timedAspect(MeterRegistry registry) {
return new TimedAspect(registry);
}
@Bean
public CountedAspect countedAspect(MeterRegistry registry) {
return new CountedAspect(registry);
}
}
// 性能监控切面
@Aspect
@Component
@Slf4j
public class PerformanceMonitoringAspect {
@Autowired
private MeterRegistry meterRegistry;
@Around("@annotation(org.springframework.web.bind.annotation.RequestMapping)")
public Object monitorRequest(ProceedingJoinPoint joinPoint) throws Throwable {
String methodName = joinPoint.getSignature().getName();
String className = joinPoint.getTarget().getClass().getSimpleName();
String operation = className + "." + methodName;
Timer.Sample sample = Timer.start(meterRegistry);
try {
Object result = joinPoint.proceed();
sample.stop(Timer.builder("http.request.duration")
.tag("method", methodName)
.tag("class", className)
.tag("status", "success")
.register(meterRegistry));
return result;
} catch (Exception e) {
sample.stop(Timer.builder("http.request.duration")
.tag("method", methodName)
.tag("class", className)
.tag("status", "error")
.register(meterRegistry));
throw e;
}
}
@Around("@annotation(org.springframework.transaction.annotation.Transactional)")
public Object monitorTransaction(ProceedingJoinPoint joinPoint) throws Throwable {
String methodName = joinPoint.getSignature().getName();
String className = joinPoint.getTarget().getClass().getSimpleName();
String operation = className + "." + methodName;
Timer.Sample sample = Timer.start(meterRegistry);
try {
Object result = joinPoint.proceed();
sample.stop(Timer.builder("transaction.duration")
.tag("method", methodName)
.tag("class", className)
.tag("status", "success")
.register(meterRegistry));
return result;
} catch (Exception e) {
sample.stop(Timer.builder("transaction.duration")
.tag("method", methodName)
.tag("class", className)
.tag("status", "error")
.register(meterRegistry));
throw e;
}
}
}
// 性能监控服务
@Service
@Slf4j
public class PerformanceMonitoringService {
@Autowired
private MeterRegistry meterRegistry;
public void recordDatabaseOperation(String operation, Duration duration) {
Timer.builder("database.operation.duration")
.tag("operation", operation)
.register(meterRegistry)
.record(duration);
log.info("数据库操作记录: {} 耗时: {}ms", operation, duration.toMillis());
}
public void recordCacheOperation(String operation, Duration duration) {
Timer.builder("cache.operation.duration")
.tag("operation", operation)
.register(meterRegistry)
.record(duration);
log.info("缓存操作记录: {} 耗时: {}ms", operation, duration.toMillis());
}
public void recordExternalServiceCall(String service, String operation, Duration duration) {
Timer.builder("external.service.duration")
.tag("service", service)
.tag("operation", operation)
.register(meterRegistry)
.record(duration);
log.info("外部服务调用记录: {} {} 耗时: {}ms", service, operation, duration.toMillis());
}
}
5.2 性能优化建议
性能优化配置:
java
// 性能优化配置
@Configuration
@EnableAsync
@EnableScheduling
@Slf4j
public class PerformanceOptimizationConfig {
@Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(50);
executor.setQueueCapacity(100);
executor.setThreadNamePrefix("async-");
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
executor.initialize();
return executor;
}
@Bean
public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory connectionFactory) {
RedisTemplate<String, Object> template = new RedisTemplate<>();
template.setConnectionFactory(connectionFactory);
// 使用Jackson序列化
Jackson2JsonRedisSerializer<Object> serializer = new Jackson2JsonRedisSerializer<>(Object.class);
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY);
objectMapper.activateDefaultTyping(LaissezFaireSubTypeValidator.instance, ObjectMapper.DefaultTyping.NON_FINAL);
serializer.setObjectMapper(objectMapper);
template.setKeySerializer(new StringRedisSerializer());
template.setValueSerializer(serializer);
template.setHashKeySerializer(new StringRedisSerializer());
template.setHashValueSerializer(serializer);
return template;
}
@Bean
public HikariDataSource dataSource() {
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:mysql://localhost:3306/microservices");
config.setUsername("root");
config.setPassword("password");
config.setDriverClassName("com.mysql.cj.jdbc.Driver");
// 连接池配置
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
config.setConnectionTimeout(30000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);
config.setLeakDetectionThreshold(60000);
// 性能优化配置
config.addDataSourceProperty("cachePrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
config.addDataSourceProperty("useServerPrepStmts", "true");
config.addDataSourceProperty("useLocalSessionState", "true");
config.addDataSourceProperty("rewriteBatchedStatements", "true");
config.addDataSourceProperty("cacheResultSetMetadata", "true");
config.addDataSourceProperty("cacheServerConfiguration", "true");
config.addDataSourceProperty("elideSetAutoCommits", "true");
config.addDataSourceProperty("maintainTimeStats", "false");
return new HikariDataSource(config);
}
}
6. DevOps实践
6.1 CI/CD流水线配置
Jenkins Pipeline配置:
groovy
// Jenkinsfile
pipeline {
agent any
environment {
DOCKER_REGISTRY = 'registry.example.com'
IMAGE_NAME = 'microservices'
VERSION = "${env.BUILD_NUMBER}"
}
stages {
stage('Checkout') {
steps {
checkout scm
}
}
stage('Build') {
steps {
sh 'mvn clean compile'
}
}
stage('Test') {
steps {
sh 'mvn test'
}
post {
always {
publishTestResults testResultsPattern: 'target/surefire-reports/*.xml'
}
}
}
stage('Code Quality') {
steps {
sh 'mvn sonar:sonar'
}
}
stage('Package') {
steps {
sh 'mvn package -DskipTests'
}
}
stage('Docker Build') {
steps {
script {
docker.build("${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION}")
}
}
}
stage('Docker Push') {
steps {
script {
docker.withRegistry("https://${DOCKER_REGISTRY}", 'docker-registry-credentials') {
docker.image("${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION}").push()
}
}
}
}
stage('Deploy to Staging') {
steps {
script {
sh "kubectl set image deployment/user-service user-service=${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION} -n staging"
sh "kubectl set image deployment/order-service order-service=${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION} -n staging"
}
}
}
stage('Integration Tests') {
steps {
sh 'mvn verify -Pintegration-tests'
}
}
stage('Deploy to Production') {
when {
branch 'main'
}
steps {
script {
sh "kubectl set image deployment/user-service user-service=${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION} -n production"
sh "kubectl set image deployment/order-service order-service=${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION} -n production"
}
}
}
}
post {
always {
cleanWs()
}
success {
emailext (
subject: "构建成功: ${env.JOB_NAME} - ${env.BUILD_NUMBER}",
body: "构建成功完成",
to: "dev-team@example.com"
)
}
failure {
emailext (
subject: "构建失败: ${env.JOB_NAME} - ${env.BUILD_NUMBER}",
body: "构建失败,请检查日志",
to: "dev-team@example.com"
)
}
}
}
6.2 Kubernetes部署配置
Kubernetes部署文件:
yaml
# user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
labels:
app: user-service
spec:
replicas: 3
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: registry.example.com/microservices:latest
ports:
- containerPort: 8081
env:
- name: SPRING_PROFILES_ACTIVE
value: "production"
- name: NACOS_SERVER_ADDR
value: "nacos:8848"
- name: MYSQL_HOST
value: "mysql"
- name: MYSQL_PORT
value: "3306"
- name: MYSQL_DATABASE
value: "userdb"
- name: MYSQL_USERNAME
valueFrom:
secretKeyRef:
name: mysql-secret
key: username
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
livenessProbe:
httpGet:
path: /actuator/health
port: 8081
initialDelaySeconds: 60
periodSeconds: 30
readinessProbe:
httpGet:
path: /actuator/health
port: 8081
initialDelaySeconds: 30
periodSeconds: 10
volumeMounts:
- name: config-volume
mountPath: /app/config
volumes:
- name: config-volume
configMap:
name: user-service-config
---
apiVersion: v1
kind: Service
metadata:
name: user-service
labels:
app: user-service
spec:
selector:
app: user-service
ports:
- protocol: TCP
port: 8081
targetPort: 8081
type: ClusterIP
---
apiVersion: v1
kind: ConfigMap
metadata:
name: user-service-config
data:
application.yml: |
spring:
application:
name: user-service
cloud:
nacos:
discovery:
server-addr: ${NACOS_SERVER_ADDR}
config:
server-addr: ${NACOS_SERVER_ADDR}
datasource:
url: jdbc:mysql://${MYSQL_HOST}:${MYSQL_PORT}/${MYSQL_DATABASE}
username: ${MYSQL_USERNAME}
password: ${MYSQL_PASSWORD}
management:
endpoints:
web:
exposure:
include: "*"
7. 学习总结与练习
7.1 核心知识点回顾
-
微服务监控
- Spring Boot Actuator健康检查
- 自定义指标监控
- 业务指标收集
-
ELK日志分析
- Elasticsearch索引管理
- Logstash日志处理
- Kibana可视化分析
-
Prometheus监控
- 指标收集和存储
- 告警规则配置
- 自定义指标收集
-
Grafana可视化
- 仪表盘设计
- 告警配置
- 数据可视化
-
APM性能监控
- 应用性能监控
- 性能优化建议
- 性能指标分析
-
DevOps实践
- CI/CD流水线
- Kubernetes部署
- 自动化运维
7.2 实践练习
练习1:监控系统搭建
- 配置Spring Boot Actuator
- 实现自定义健康检查
- 配置业务指标监控
练习2:ELK日志分析
- 配置Elasticsearch集群
- 设置Logstash管道
- 设计Kibana仪表盘
练习3:Prometheus监控
- 配置Prometheus服务器
- 设置告警规则
- 实现自定义指标收集
练习4:Grafana可视化
- 设计监控仪表盘
- 配置告警通知
- 实现数据可视化
练习5:APM性能监控
- 实现性能监控切面
- 配置性能优化参数
- 分析性能瓶颈
练习6:DevOps实践
- 配置CI/CD流水线
- 实现Kubernetes部署
- 设置自动化运维
7.3 学习建议
- 理论学习:深入理解监控和运维的核心概念
- 实践操作:通过实际项目练习监控系统搭建
- 工具使用:熟练掌握各种监控和运维工具
- 性能优化:学习系统性能分析和优化方法
- 自动化运维:掌握DevOps最佳实践
8. 明日预告
第27天将学习:
- 微服务安全:OAuth2、JWT、API安全
- 服务网格:Istio服务网格管理
- 云原生架构:12-Factor App、云原生设计
- 性能调优:JVM调优、数据库优化
- 故障排查:问题诊断和解决方案