Java学习第26天 - 微服务监控与运维实践

学习目标

掌握微服务监控与运维的完整技术栈,学习Spring Boot Actuator健康检查,掌握ELK日志分析系统,了解Prometheus指标监控,掌握Grafana可视化设计,学习APM性能监控和DevOps自动化实践。


1. 微服务监控基础

1.1 Spring Boot Actuator健康检查

Actuator配置:

yaml 复制代码
# application.yml
management:
  endpoints:
    web:
      exposure:
        include: "*"
      base-path: /actuator
  endpoint:
    health:
      show-details: when-authorized
      show-components: always
    info:
      enabled: true
    metrics:
      enabled: true
    prometheus:
      enabled: true
  health:
    defaults:
      enabled: true
    redis:
      enabled: true
    db:
      enabled: true
    diskspace:
      enabled: true
      threshold: 100MB
  metrics:
    export:
      prometheus:
        enabled: true
    distribution:
      percentiles-histogram:
        http.server.requests: true
      percentiles:
        http.server.requests: 0.5, 0.95, 0.99
      slo:
        http.server.requests: 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s
  info:
    env:
      enabled: true
    java:
      enabled: true
    os:
      enabled: true
    build:
      enabled: true

自定义健康检查:

java 复制代码
// 数据库健康检查
@Component
@Slf4j
public class DatabaseHealthIndicator implements HealthIndicator {
    
    @Autowired
    private DataSource dataSource;
    
    @Override
    public Health health() {
        try (Connection connection = dataSource.getConnection()) {
            if (connection.isValid(1)) {
                return Health.up()
                    .withDetail("database", "MySQL")
                    .withDetail("validationQuery", "SELECT 1")
                    .withDetail("responseTime", System.currentTimeMillis())
                    .build();
            }
        } catch (SQLException e) {
            log.error("数据库健康检查失败", e);
            return Health.down()
                .withDetail("database", "MySQL")
                .withDetail("error", e.getMessage())
                .build();
        }
        return Health.down()
            .withDetail("database", "MySQL")
            .withDetail("error", "连接无效")
            .build();
    }
}

// Redis健康检查
@Component
@Slf4j
public class RedisHealthIndicator implements HealthIndicator {
    
    @Autowired
    private RedisTemplate<String, String> redisTemplate;
    
    @Override
    public Health health() {
        try {
            String pong = redisTemplate.getConnectionFactory()
                .getConnection()
                .ping();
            
            if ("PONG".equals(pong)) {
                return Health.up()
                    .withDetail("redis", "Redis")
                    .withDetail("response", pong)
                    .withDetail("responseTime", System.currentTimeMillis())
                    .build();
            }
        } catch (Exception e) {
            log.error("Redis健康检查失败", e);
            return Health.down()
                .withDetail("redis", "Redis")
                .withDetail("error", e.getMessage())
                .build();
        }
        return Health.down()
            .withDetail("redis", "Redis")
            .withDetail("error", "连接失败")
            .build();
    }
}

// 业务健康检查
@Component
@Slf4j
public class BusinessHealthIndicator implements HealthIndicator {
    
    @Autowired
    private UserService userService;
    
    @Autowired
    private OrderService orderService;
    
    @Override
    public Health health() {
        try {
            // 检查用户服务
            long userCount = userService.getUserCount();
            if (userCount < 0) {
                return Health.down()
                    .withDetail("userService", "用户服务异常")
                    .withDetail("userCount", userCount)
                    .build();
            }
            
            // 检查订单服务
            long orderCount = orderService.getOrderCount();
            if (orderCount < 0) {
                return Health.down()
                    .withDetail("orderService", "订单服务异常")
                    .withDetail("orderCount", orderCount)
                    .build();
            }
            
            return Health.up()
                .withDetail("userService", "正常")
                .withDetail("userCount", userCount)
                .withDetail("orderService", "正常")
                .withDetail("orderCount", orderCount)
                .build();
        } catch (Exception e) {
            log.error("业务健康检查失败", e);
            return Health.down()
                .withDetail("error", e.getMessage())
                .build();
        }
    }
}

1.2 自定义指标监控

业务指标监控:

java 复制代码
// 自定义指标配置
@Configuration
@EnableMetrics
@Slf4j
public class MetricsConfig {
    
    @Bean
    public MeterRegistry meterRegistry() {
        return new SimpleMeterRegistry();
    }
    
    @Bean
    public TimedAspect timedAspect(MeterRegistry registry) {
        return new TimedAspect(registry);
    }
    
    @Bean
    public CountedAspect countedAspect(MeterRegistry registry) {
        return new CountedAspect(registry);
    }
}

// 业务指标服务
@Service
@Slf4j
public class BusinessMetricsService {
    
    private final Counter userRegistrationCounter;
    private final Counter orderCreationCounter;
    private final Timer orderProcessingTimer;
    private final Gauge activeUserGauge;
    private final DistributionSummary orderAmountSummary;
    
    public BusinessMetricsService(MeterRegistry meterRegistry) {
        this.userRegistrationCounter = Counter.builder("user.registration.total")
            .description("用户注册总数")
            .tag("type", "registration")
            .register(meterRegistry);
        
        this.orderCreationCounter = Counter.builder("order.creation.total")
            .description("订单创建总数")
            .tag("type", "creation")
            .register(meterRegistry);
        
        this.orderProcessingTimer = Timer.builder("order.processing.duration")
            .description("订单处理时间")
            .register(meterRegistry);
        
        this.activeUserGauge = Gauge.builder("user.active.count")
            .description("活跃用户数")
            .register(meterRegistry, this, BusinessMetricsService::getActiveUserCount);
        
        this.orderAmountSummary = DistributionSummary.builder("order.amount.distribution")
            .description("订单金额分布")
            .baseUnit("CNY")
            .register(meterRegistry);
    }
    
    public void incrementUserRegistration() {
        userRegistrationCounter.increment();
        log.info("用户注册指标更新");
    }
    
    public void incrementOrderCreation() {
        orderCreationCounter.increment();
        log.info("订单创建指标更新");
    }
    
    public void recordOrderProcessingTime(Duration duration) {
        orderProcessingTimer.record(duration);
        log.info("订单处理时间记录: {}ms", duration.toMillis());
    }
    
    public void recordOrderAmount(BigDecimal amount) {
        orderAmountSummary.record(amount.doubleValue());
        log.info("订单金额记录: {}", amount);
    }
    
    private double getActiveUserCount() {
        // 实际业务逻辑:获取活跃用户数
        return 1000.0; // 示例值
    }
}

// 指标监控切面
@Aspect
@Component
@Slf4j
public class MetricsAspect {
    
    @Autowired
    private BusinessMetricsService metricsService;
    
    @Around("@annotation(org.springframework.web.bind.annotation.PostMapping) && execution(* com.example.service.*.create*(..))")
    public Object monitorCreateOperations(ProceedingJoinPoint joinPoint) throws Throwable {
        String methodName = joinPoint.getSignature().getName();
        long startTime = System.currentTimeMillis();
        
        try {
            Object result = joinPoint.proceed();
            
            // 记录成功指标
            if (methodName.contains("User")) {
                metricsService.incrementUserRegistration();
            } else if (methodName.contains("Order")) {
                metricsService.incrementOrderCreation();
            }
            
            return result;
        } catch (Exception e) {
            // 记录失败指标
            log.error("操作失败: {}", methodName, e);
            throw e;
        } finally {
            long duration = System.currentTimeMillis() - startTime;
            log.info("方法执行时间: {}ms", duration);
        }
    }
    
    @Around("@annotation(org.springframework.web.bind.annotation.GetMapping) && execution(* com.example.service.*.get*(..))")
    public Object monitorGetOperations(ProceedingJoinPoint joinPoint) throws Throwable {
        String methodName = joinPoint.getSignature().getName();
        long startTime = System.currentTimeMillis();
        
        try {
            return joinPoint.proceed();
        } finally {
            long duration = System.currentTimeMillis() - startTime;
            log.info("查询方法执行时间: {}ms", duration);
        }
    }
}

2. ELK日志分析系统

2.1 Elasticsearch配置

Elasticsearch配置:

yaml 复制代码
# elasticsearch.yml
cluster.name: microservices-cluster
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["localhost"]
cluster.initial_master_nodes: ["node-1"]

# 索引配置
index:
  number_of_shards: 3
  number_of_replicas: 1
  refresh_interval: 1s

# 日志配置
path:
  data: /usr/share/elasticsearch/data
  logs: /usr/share/elasticsearch/logs

# 内存配置
bootstrap.memory_lock: true

应用日志配置:

xml 复制代码
<!-- logback-spring.xml -->
<configuration>
    <springProfile name="!prod">
        <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
        <include resource="org/springframework/boot/logging/logback/console-appender.xml"/>
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
        </root>
    </springProfile>
    
    <springProfile name="prod">
        <!-- 控制台输出 -->
        <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
            <encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
                <providers>
                    <timestamp/>
                    <logLevel/>
                    <loggerName/>
                    <mdc/>
                    <message/>
                    <stackTrace/>
                </providers>
            </encoder>
        </appender>
        
        <!-- 文件输出 -->
        <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
            <file>/app/logs/application.log</file>
            <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
                <fileNamePattern>/app/logs/application.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
                <maxFileSize>100MB</maxFileSize>
                <maxHistory>30</maxHistory>
                <totalSizeCap>3GB</totalSizeCap>
            </rollingPolicy>
            <encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
                <providers>
                    <timestamp/>
                    <logLevel/>
                    <loggerName/>
                    <mdc/>
                    <message/>
                    <stackTrace/>
                </providers>
            </encoder>
        </appender>
        
        <!-- 异步输出 -->
        <appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
            <appender-ref ref="FILE"/>
            <queueSize>1024</queueSize>
            <discardingThreshold>0</discardingThreshold>
            <includeCallerData>true</includeCallerData>
        </appender>
        
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
            <appender-ref ref="ASYNC"/>
        </root>
    </springProfile>
</configuration>

2.2 Logstash配置

Logstash管道配置:

ruby 复制代码
# logstash.conf
input {
  beats {
    port => 5044
  }
  
  tcp {
    port => 5000
    codec => json_lines
  }
  
  file {
    path => "/app/logs/*.log"
    start_position => "beginning"
    codec => json
  }
}

filter {
  if [fields][service] == "user-service" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] %{LOGLEVEL:level} \[%{DATA:traceId},%{DATA:spanId}\] %{DATA:logger} - %{GREEDYDATA:message}" }
    }
    
    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
    }
    
    mutate {
      add_field => { "service_name" => "user-service" }
      add_field => { "environment" => "production" }
    }
  }
  
  if [fields][service] == "order-service" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] %{LOGLEVEL:level} \[%{DATA:traceId},%{DATA:spanId}\] %{DATA:logger} - %{GREEDYDATA:message}" }
    }
    
    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
    }
    
    mutate {
      add_field => { "service_name" => "order-service" }
      add_field => { "environment" => "production" }
    }
  }
  
  # 错误日志特殊处理
  if [level] == "ERROR" {
    mutate {
      add_field => { "alert_level" => "high" }
      add_field => { "notification_required" => "true" }
    }
  }
  
  # 性能日志处理
  if [message] =~ /执行时间/ {
    grok {
      match => { "message" => "方法执行时间: %{NUMBER:execution_time}ms" }
    }
    
    mutate {
      convert => { "execution_time" => "integer" }
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "microservices-logs-%{+YYYY.MM.dd}"
    template_name => "microservices-logs"
    template => "/usr/share/logstash/templates/microservices-logs.json"
    template_overwrite => true
  }
  
  # 错误日志单独索引
  if [level] == "ERROR" {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
      index => "error-logs-%{+YYYY.MM.dd}"
    }
  }
  
  # 性能日志单独索引
  if [execution_time] {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
      index => "performance-logs-%{+YYYY.MM.dd}"
    }
  }
}

2.3 Kibana可视化配置

Kibana索引模式:

json 复制代码
{
  "index_patterns": ["microservices-logs-*"],
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "index.refresh_interval": "1s"
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "level": {
        "type": "keyword"
      },
      "service_name": {
        "type": "keyword"
      },
      "environment": {
        "type": "keyword"
      },
      "traceId": {
        "type": "keyword"
      },
      "spanId": {
        "type": "keyword"
      },
      "message": {
        "type": "text",
        "analyzer": "standard"
      },
      "execution_time": {
        "type": "integer"
      },
      "alert_level": {
        "type": "keyword"
      },
      "notification_required": {
        "type": "boolean"
      }
    }
  }
}

Kibana仪表盘配置:

json 复制代码
{
  "version": "8.0.0",
  "kibana": {
    "version": "8.0.0"
  },
  "saved_objects": [
    {
      "id": "microservices-dashboard",
      "type": "dashboard",
      "attributes": {
        "title": "微服务监控仪表盘",
        "panelsJSON": "[{\"version\":\"8.0.0\",\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":15,\"i\":\"1\"},\"panelIndex\":\"1\",\"embeddableConfig\":{\"savedVis\":{\"title\":\"错误日志趋势\",\"visState\":\"{\\\"title\\\":\\\"错误日志趋势\\\",\\\"type\\\":\\\"histogram\\\",\\\"params\\\":{\\\"grid\\\":{\\\"categoryLines\\\":false,\\\"style\\\":{\\\"color\\\":\"#eee\\\"}},\\\"categoryAxes\\\":[{\\\"id\\\":\\\"CategoryAxis-1\\\",\\\"type\\\":\\\"category\\\",\\\"position\\\":\\\"bottom\\\",\\\"show\\\":true,\\\"style\\\":{},\\\"scale\\\":{\\\"type\\\":\\\"linear\\\"},\\\"labels\\\":{\\\"show\\\":true,\\\"truncate\\\":100},\\\"title\\\":{}}],\\\"valueAxes\\\":[{\\\"id\\\":\\\"ValueAxis-1\\\",\\\"name\\\":\\\"LeftAxis-1\\\",\\\"type\\\":\\\"value\\\",\\\"position\\\":\\\"left\\\",\\\"show\\\":true,\\\"style\\\":{},\\\"scale\\\":{\\\"type\\\":\\\"linear\\\",\\\"mode\\\":\\\"normal\\\"},\\\"labels\\\":{\\\"show\\\":true,\\\"rotate\\\":0,\\\"filter\\\":false,\\\"truncate\\\":100},\\\"title\\\":{\\\"text\\\":\\\"Count\\\"}}],\\\"seriesParams\\\":[{\\\"data\\\":{\\\"id\\\":\\\"1\\\",\\\"label\\\":\\\"Count\\\"},\\\"type\\\":\\\"histogram\\\",\\\"mode\\\":\\\"stacked\\\",\\\"show\\\":true,\\\"valueAxis\\\":\\\"ValueAxis-1\\\",\\\"drawLinesBetweenPoints\\\":true,\\\"showCircles\\\":true}],\\\"addTooltip\\\":true,\\\"addLegend\\\":true,\\\"legendPosition\\\":\\\"right\\\",\\\"times\\\":[],\\\"addTimeMarker\\\":false},\\\"aggs\\\":[{\\\"id\\\":\\\"1\\\",\\\"type\\\":\\\"count\\\",\\\"schema\\\":\\\"metric\\\",\\\"params\\\":{}},{\\\"id\\\":\\\"2\\\",\\\"type\\\":\\\"date_histogram\\\",\\\"schema\\\":\\\"segment\\\",\\\"params\\\":{\\\"field\\\":\\\"@timestamp\\\",\\\"interval\\\":\\\"auto\\\",\\\"customInterval\\\":\\\"2h\\\",\\\"min_doc_count\\\":1,\\\"extended_bounds\\\":{}}},{\\\"id\\\":\\\"3\\\",\\\"type\\\":\\\"filters\\\",\\\"schema\\\":\\\"group\\\",\\\"params\\\":{\\\"filters\\\":[{\\\"input\\\":{\\\"query\\\":{\\\"query_string\\\":{\\\"query\\\":\\\"level:ERROR\\\",\\\"analyze_wildcard\\\":true}}},\\\"label\\\":\\\"Error\\\"}]}}]}\"}}}]"
      }
    }
  ]
}

3. Prometheus指标监控

3.1 Prometheus配置

Prometheus配置:

yaml 复制代码
# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'user-service'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['user-service:8081']
    scrape_interval: 5s
  
  - job_name: 'order-service'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['order-service:8082']
    scrape_interval: 5s
  
  - job_name: 'api-gateway'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['api-gateway:8080']
    scrape_interval: 5s
  
  - job_name: 'mysql'
    static_configs:
      - targets: ['mysql-exporter:9104']
  
  - job_name: 'redis'
    static_configs:
      - targets: ['redis-exporter:9121']

告警规则配置:

yaml 复制代码
# alert_rules.yml
groups:
- name: microservices
  rules:
  - alert: HighErrorRate
    expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.1
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "高错误率告警"
      description: "服务 {{ $labels.instance }} 错误率超过10%,当前值: {{ $value }}"
  
  - alert: HighResponseTime
    expr: histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m])) > 1
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "高响应时间告警"
      description: "服务 {{ $labels.instance }} 95%响应时间超过1秒,当前值: {{ $value }}s"
  
  - alert: ServiceDown
    expr: up == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "服务宕机告警"
      description: "服务 {{ $labels.instance }} 已宕机"
  
  - alert: HighMemoryUsage
    expr: (jvm_memory_used_bytes / jvm_memory_max_bytes) > 0.8
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "高内存使用率告警"
      description: "服务 {{ $labels.instance }} 内存使用率超过80%,当前值: {{ $value }}"
  
  - alert: HighCPUUsage
    expr: system_cpu_usage > 0.8
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "高CPU使用率告警"
      description: "服务 {{ $labels.instance }} CPU使用率超过80%,当前值: {{ $value }}"

3.2 自定义指标收集

自定义指标收集器:

java 复制代码
// 自定义指标收集器
@Component
@Slf4j
public class CustomMetricsCollector {
    
    private final MeterRegistry meterRegistry;
    private final Counter businessOperationCounter;
    private final Timer businessOperationTimer;
    private final Gauge businessQueueSize;
    private final DistributionSummary businessAmountSummary;
    
    public CustomMetricsCollector(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        
        this.businessOperationCounter = Counter.builder("business.operation.total")
            .description("业务操作总数")
            .tag("type", "operation")
            .register(meterRegistry);
        
        this.businessOperationTimer = Timer.builder("business.operation.duration")
            .description("业务操作耗时")
            .register(meterRegistry);
        
        this.businessQueueSize = Gauge.builder("business.queue.size")
            .description("业务队列大小")
            .register(meterRegistry, this, CustomMetricsCollector::getQueueSize);
        
        this.businessAmountSummary = DistributionSummary.builder("business.amount.distribution")
            .description("业务金额分布")
            .baseUnit("CNY")
            .register(meterRegistry);
    }
    
    public void recordBusinessOperation(String operation, Duration duration) {
        businessOperationCounter.increment(
            Tags.of("operation", operation)
        );
        businessOperationTimer.record(duration);
        log.info("业务操作记录: {} 耗时: {}ms", operation, duration.toMillis());
    }
    
    public void recordBusinessAmount(BigDecimal amount) {
        businessAmountSummary.record(amount.doubleValue());
        log.info("业务金额记录: {}", amount);
    }
    
    private double getQueueSize() {
        // 实际业务逻辑:获取队列大小
        return 100.0; // 示例值
    }
}

// 指标收集切面
@Aspect
@Component
@Slf4j
public class MetricsCollectionAspect {
    
    @Autowired
    private CustomMetricsCollector metricsCollector;
    
    @Around("@annotation(org.springframework.web.bind.annotation.PostMapping)")
    public Object collectPostMetrics(ProceedingJoinPoint joinPoint) throws Throwable {
        String methodName = joinPoint.getSignature().getName();
        long startTime = System.currentTimeMillis();
        
        try {
            Object result = joinPoint.proceed();
            
            Duration duration = Duration.ofMillis(System.currentTimeMillis() - startTime);
            metricsCollector.recordBusinessOperation("POST_" + methodName, duration);
            
            return result;
        } catch (Exception e) {
            Duration duration = Duration.ofMillis(System.currentTimeMillis() - startTime);
            metricsCollector.recordBusinessOperation("POST_" + methodName + "_ERROR", duration);
            throw e;
        }
    }
    
    @Around("@annotation(org.springframework.web.bind.annotation.GetMapping)")
    public Object collectGetMetrics(ProceedingJoinPoint joinPoint) throws Throwable {
        String methodName = joinPoint.getSignature().getName();
        long startTime = System.currentTimeMillis();
        
        try {
            Object result = joinPoint.proceed();
            
            Duration duration = Duration.ofMillis(System.currentTimeMillis() - startTime);
            metricsCollector.recordBusinessOperation("GET_" + methodName, duration);
            
            return result;
        } catch (Exception e) {
            Duration duration = Duration.ofMillis(System.currentTimeMillis() - startTime);
            metricsCollector.recordBusinessOperation("GET_" + methodName + "_ERROR", duration);
            throw e;
        }
    }
}

4. Grafana可视化设计

4.1 Grafana仪表盘配置

Grafana仪表盘JSON:

json 复制代码
{
  "dashboard": {
    "id": null,
    "title": "微服务监控仪表盘",
    "tags": ["microservices", "monitoring"],
    "style": "dark",
    "timezone": "browser",
    "refresh": "5s",
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "panels": [
      {
        "id": 1,
        "title": "服务状态",
        "type": "stat",
        "targets": [
          {
            "expr": "up",
            "legendFormat": "{{instance}}"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "color": {
              "mode": "thresholds"
            },
            "thresholds": {
              "steps": [
                {
                  "color": "red",
                  "value": 0
                },
                {
                  "color": "green",
                  "value": 1
                }
              ]
            }
          }
        },
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 0,
          "y": 0
        }
      },
      {
        "id": 2,
        "title": "请求率",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_server_requests_seconds_count[5m])",
            "legendFormat": "{{instance}} - {{method}} {{uri}}"
          }
        ],
        "yAxes": [
          {
            "label": "请求/秒",
            "min": 0
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 12,
          "y": 0
        }
      },
      {
        "id": 3,
        "title": "响应时间",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m]))",
            "legendFormat": "95%响应时间"
          },
          {
            "expr": "histogram_quantile(0.50, rate(http_server_requests_seconds_bucket[5m]))",
            "legendFormat": "50%响应时间"
          }
        ],
        "yAxes": [
          {
            "label": "秒",
            "min": 0
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 0,
          "y": 8
        }
      },
      {
        "id": 4,
        "title": "错误率",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_server_requests_seconds_count{status=~\"5..\"}[5m])",
            "legendFormat": "5xx错误率"
          },
          {
            "expr": "rate(http_server_requests_seconds_count{status=~\"4..\"}[5m])",
            "legendFormat": "4xx错误率"
          }
        ],
        "yAxes": [
          {
            "label": "错误/秒",
            "min": 0
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 12,
          "y": 8
        }
      },
      {
        "id": 5,
        "title": "JVM内存使用",
        "type": "graph",
        "targets": [
          {
            "expr": "jvm_memory_used_bytes{area=\"heap\"}",
            "legendFormat": "堆内存使用"
          },
          {
            "expr": "jvm_memory_max_bytes{area=\"heap\"}",
            "legendFormat": "堆内存最大值"
          }
        ],
        "yAxes": [
          {
            "label": "字节",
            "min": 0
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 0,
          "y": 16
        }
      },
      {
        "id": 6,
        "title": "业务指标",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(business_operation_total[5m])",
            "legendFormat": "业务操作率"
          },
          {
            "expr": "business_queue_size",
            "legendFormat": "队列大小"
          }
        ],
        "yAxes": [
          {
            "label": "操作/秒",
            "min": 0
          }
        ],
        "gridPos": {
          "h": 8,
          "w": 12,
          "x": 12,
          "y": 16
        }
      }
    ]
  }
}

4.2 告警配置

Grafana告警配置:

json 复制代码
{
  "alert": {
    "id": 1,
    "name": "高错误率告警",
    "message": "服务错误率超过阈值",
    "frequency": "10s",
    "handler": 1,
    "enabled": true,
    "executionError": "alerting",
    "for": "2m",
    "conditions": [
      {
        "evaluator": {
          "params": [0.1],
          "type": "gt"
        },
        "operator": {
          "type": "and"
        },
        "query": {
          "params": ["A", "5m", "now"]
        },
        "reducer": {
          "params": [],
          "type": "last"
        },
        "type": "query"
      }
    ],
    "settings": {
      "noDataState": "no_data",
      "executionErrorState": "alerting"
    }
  }
}

5. APM性能监控

5.1 应用性能监控

APM配置:

java 复制代码
// APM配置类
@Configuration
@EnableAspectJAutoProxy
@Slf4j
public class APMConfig {
    
    @Bean
    public MeterRegistry meterRegistry() {
        return new SimpleMeterRegistry();
    }
    
    @Bean
    public TimedAspect timedAspect(MeterRegistry registry) {
        return new TimedAspect(registry);
    }
    
    @Bean
    public CountedAspect countedAspect(MeterRegistry registry) {
        return new CountedAspect(registry);
    }
}

// 性能监控切面
@Aspect
@Component
@Slf4j
public class PerformanceMonitoringAspect {
    
    @Autowired
    private MeterRegistry meterRegistry;
    
    @Around("@annotation(org.springframework.web.bind.annotation.RequestMapping)")
    public Object monitorRequest(ProceedingJoinPoint joinPoint) throws Throwable {
        String methodName = joinPoint.getSignature().getName();
        String className = joinPoint.getTarget().getClass().getSimpleName();
        String operation = className + "." + methodName;
        
        Timer.Sample sample = Timer.start(meterRegistry);
        
        try {
            Object result = joinPoint.proceed();
            
            sample.stop(Timer.builder("http.request.duration")
                .tag("method", methodName)
                .tag("class", className)
                .tag("status", "success")
                .register(meterRegistry));
            
            return result;
        } catch (Exception e) {
            sample.stop(Timer.builder("http.request.duration")
                .tag("method", methodName)
                .tag("class", className)
                .tag("status", "error")
                .register(meterRegistry));
            
            throw e;
        }
    }
    
    @Around("@annotation(org.springframework.transaction.annotation.Transactional)")
    public Object monitorTransaction(ProceedingJoinPoint joinPoint) throws Throwable {
        String methodName = joinPoint.getSignature().getName();
        String className = joinPoint.getTarget().getClass().getSimpleName();
        String operation = className + "." + methodName;
        
        Timer.Sample sample = Timer.start(meterRegistry);
        
        try {
            Object result = joinPoint.proceed();
            
            sample.stop(Timer.builder("transaction.duration")
                .tag("method", methodName)
                .tag("class", className)
                .tag("status", "success")
                .register(meterRegistry));
            
            return result;
        } catch (Exception e) {
            sample.stop(Timer.builder("transaction.duration")
                .tag("method", methodName)
                .tag("class", className)
                .tag("status", "error")
                .register(meterRegistry));
            
            throw e;
        }
    }
}

// 性能监控服务
@Service
@Slf4j
public class PerformanceMonitoringService {
    
    @Autowired
    private MeterRegistry meterRegistry;
    
    public void recordDatabaseOperation(String operation, Duration duration) {
        Timer.builder("database.operation.duration")
            .tag("operation", operation)
            .register(meterRegistry)
            .record(duration);
        
        log.info("数据库操作记录: {} 耗时: {}ms", operation, duration.toMillis());
    }
    
    public void recordCacheOperation(String operation, Duration duration) {
        Timer.builder("cache.operation.duration")
            .tag("operation", operation)
            .register(meterRegistry)
            .record(duration);
        
        log.info("缓存操作记录: {} 耗时: {}ms", operation, duration.toMillis());
    }
    
    public void recordExternalServiceCall(String service, String operation, Duration duration) {
        Timer.builder("external.service.duration")
            .tag("service", service)
            .tag("operation", operation)
            .register(meterRegistry)
            .record(duration);
        
        log.info("外部服务调用记录: {} {} 耗时: {}ms", service, operation, duration.toMillis());
    }
}

5.2 性能优化建议

性能优化配置:

java 复制代码
// 性能优化配置
@Configuration
@EnableAsync
@EnableScheduling
@Slf4j
public class PerformanceOptimizationConfig {
    
    @Bean
    public TaskExecutor taskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(10);
        executor.setMaxPoolSize(50);
        executor.setQueueCapacity(100);
        executor.setThreadNamePrefix("async-");
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        executor.initialize();
        return executor;
    }
    
    @Bean
    public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory connectionFactory) {
        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(connectionFactory);
        
        // 使用Jackson序列化
        Jackson2JsonRedisSerializer<Object> serializer = new Jackson2JsonRedisSerializer<>(Object.class);
        ObjectMapper objectMapper = new ObjectMapper();
        objectMapper.setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY);
        objectMapper.activateDefaultTyping(LaissezFaireSubTypeValidator.instance, ObjectMapper.DefaultTyping.NON_FINAL);
        serializer.setObjectMapper(objectMapper);
        
        template.setKeySerializer(new StringRedisSerializer());
        template.setValueSerializer(serializer);
        template.setHashKeySerializer(new StringRedisSerializer());
        template.setHashValueSerializer(serializer);
        
        return template;
    }
    
    @Bean
    public HikariDataSource dataSource() {
        HikariConfig config = new HikariConfig();
        config.setJdbcUrl("jdbc:mysql://localhost:3306/microservices");
        config.setUsername("root");
        config.setPassword("password");
        config.setDriverClassName("com.mysql.cj.jdbc.Driver");
        
        // 连接池配置
        config.setMaximumPoolSize(20);
        config.setMinimumIdle(5);
        config.setConnectionTimeout(30000);
        config.setIdleTimeout(600000);
        config.setMaxLifetime(1800000);
        config.setLeakDetectionThreshold(60000);
        
        // 性能优化配置
        config.addDataSourceProperty("cachePrepStmts", "true");
        config.addDataSourceProperty("prepStmtCacheSize", "250");
        config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
        config.addDataSourceProperty("useServerPrepStmts", "true");
        config.addDataSourceProperty("useLocalSessionState", "true");
        config.addDataSourceProperty("rewriteBatchedStatements", "true");
        config.addDataSourceProperty("cacheResultSetMetadata", "true");
        config.addDataSourceProperty("cacheServerConfiguration", "true");
        config.addDataSourceProperty("elideSetAutoCommits", "true");
        config.addDataSourceProperty("maintainTimeStats", "false");
        
        return new HikariDataSource(config);
    }
}

6. DevOps实践

6.1 CI/CD流水线配置

Jenkins Pipeline配置:

groovy 复制代码
// Jenkinsfile
pipeline {
    agent any
    
    environment {
        DOCKER_REGISTRY = 'registry.example.com'
        IMAGE_NAME = 'microservices'
        VERSION = "${env.BUILD_NUMBER}"
    }
    
    stages {
        stage('Checkout') {
            steps {
                checkout scm
            }
        }
        
        stage('Build') {
            steps {
                sh 'mvn clean compile'
            }
        }
        
        stage('Test') {
            steps {
                sh 'mvn test'
            }
            post {
                always {
                    publishTestResults testResultsPattern: 'target/surefire-reports/*.xml'
                }
            }
        }
        
        stage('Code Quality') {
            steps {
                sh 'mvn sonar:sonar'
            }
        }
        
        stage('Package') {
            steps {
                sh 'mvn package -DskipTests'
            }
        }
        
        stage('Docker Build') {
            steps {
                script {
                    docker.build("${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION}")
                }
            }
        }
        
        stage('Docker Push') {
            steps {
                script {
                    docker.withRegistry("https://${DOCKER_REGISTRY}", 'docker-registry-credentials') {
                        docker.image("${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION}").push()
                    }
                }
            }
        }
        
        stage('Deploy to Staging') {
            steps {
                script {
                    sh "kubectl set image deployment/user-service user-service=${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION} -n staging"
                    sh "kubectl set image deployment/order-service order-service=${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION} -n staging"
                }
            }
        }
        
        stage('Integration Tests') {
            steps {
                sh 'mvn verify -Pintegration-tests'
            }
        }
        
        stage('Deploy to Production') {
            when {
                branch 'main'
            }
            steps {
                script {
                    sh "kubectl set image deployment/user-service user-service=${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION} -n production"
                    sh "kubectl set image deployment/order-service order-service=${DOCKER_REGISTRY}/${IMAGE_NAME}:${VERSION} -n production"
                }
            }
        }
    }
    
    post {
        always {
            cleanWs()
        }
        success {
            emailext (
                subject: "构建成功: ${env.JOB_NAME} - ${env.BUILD_NUMBER}",
                body: "构建成功完成",
                to: "dev-team@example.com"
            )
        }
        failure {
            emailext (
                subject: "构建失败: ${env.JOB_NAME} - ${env.BUILD_NUMBER}",
                body: "构建失败,请检查日志",
                to: "dev-team@example.com"
            )
        }
    }
}

6.2 Kubernetes部署配置

Kubernetes部署文件:

yaml 复制代码
# user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  labels:
    app: user-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
      - name: user-service
        image: registry.example.com/microservices:latest
        ports:
        - containerPort: 8081
        env:
        - name: SPRING_PROFILES_ACTIVE
          value: "production"
        - name: NACOS_SERVER_ADDR
          value: "nacos:8848"
        - name: MYSQL_HOST
          value: "mysql"
        - name: MYSQL_PORT
          value: "3306"
        - name: MYSQL_DATABASE
          value: "userdb"
        - name: MYSQL_USERNAME
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: username
        - name: MYSQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /actuator/health
            port: 8081
          initialDelaySeconds: 60
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /actuator/health
            port: 8081
          initialDelaySeconds: 30
          periodSeconds: 10
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
      volumes:
      - name: config-volume
        configMap:
          name: user-service-config
---
apiVersion: v1
kind: Service
metadata:
  name: user-service
  labels:
    app: user-service
spec:
  selector:
    app: user-service
  ports:
  - protocol: TCP
    port: 8081
    targetPort: 8081
  type: ClusterIP
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-service-config
data:
  application.yml: |
    spring:
      application:
        name: user-service
      cloud:
        nacos:
          discovery:
            server-addr: ${NACOS_SERVER_ADDR}
          config:
            server-addr: ${NACOS_SERVER_ADDR}
      datasource:
        url: jdbc:mysql://${MYSQL_HOST}:${MYSQL_PORT}/${MYSQL_DATABASE}
        username: ${MYSQL_USERNAME}
        password: ${MYSQL_PASSWORD}
    management:
      endpoints:
        web:
          exposure:
            include: "*"

7. 学习总结与练习

7.1 核心知识点回顾

  1. 微服务监控

    • Spring Boot Actuator健康检查
    • 自定义指标监控
    • 业务指标收集
  2. ELK日志分析

    • Elasticsearch索引管理
    • Logstash日志处理
    • Kibana可视化分析
  3. Prometheus监控

    • 指标收集和存储
    • 告警规则配置
    • 自定义指标收集
  4. Grafana可视化

    • 仪表盘设计
    • 告警配置
    • 数据可视化
  5. APM性能监控

    • 应用性能监控
    • 性能优化建议
    • 性能指标分析
  6. DevOps实践

    • CI/CD流水线
    • Kubernetes部署
    • 自动化运维

7.2 实践练习

练习1:监控系统搭建

  • 配置Spring Boot Actuator
  • 实现自定义健康检查
  • 配置业务指标监控

练习2:ELK日志分析

  • 配置Elasticsearch集群
  • 设置Logstash管道
  • 设计Kibana仪表盘

练习3:Prometheus监控

  • 配置Prometheus服务器
  • 设置告警规则
  • 实现自定义指标收集

练习4:Grafana可视化

  • 设计监控仪表盘
  • 配置告警通知
  • 实现数据可视化

练习5:APM性能监控

  • 实现性能监控切面
  • 配置性能优化参数
  • 分析性能瓶颈

练习6:DevOps实践

  • 配置CI/CD流水线
  • 实现Kubernetes部署
  • 设置自动化运维

7.3 学习建议

  1. 理论学习:深入理解监控和运维的核心概念
  2. 实践操作:通过实际项目练习监控系统搭建
  3. 工具使用:熟练掌握各种监控和运维工具
  4. 性能优化:学习系统性能分析和优化方法
  5. 自动化运维:掌握DevOps最佳实践

8. 明日预告

第27天将学习:

  • 微服务安全:OAuth2、JWT、API安全
  • 服务网格:Istio服务网格管理
  • 云原生架构:12-Factor App、云原生设计
  • 性能调优:JVM调优、数据库优化
  • 故障排查:问题诊断和解决方案
相关推荐
高山上有一只小老虎14 小时前
idea2025社区版设置打开的多个文件展示在工具栏下方
java·ide·intellij-idea
凸头14 小时前
责任链模式
java·开发语言·责任链模式
qq_4798754315 小时前
TimerFd & Epoll
java·服务器·数据库
Flobby52915 小时前
「JMM+Java锁+AQS」 知识图谱
java·后端
Deschen15 小时前
设计模式-组合模式
java·设计模式·组合模式
焰火199915 小时前
[Java]Redisson的分布式锁及看门狗机制
java·后端
zz-zjx15 小时前
JVM垃圾收集器详解(jdk21+25实战版)
java·开发语言·jvm
摇滚侠15 小时前
Spring Boot 3零基础教程,Spring Boot 日志级别,笔记19
java·spring boot·笔记
zl97989916 小时前
SpringBoot-配置文件yaml
java·spring boot·spring