第9篇:监控与运维 - 集成Actuator健康检查

前言

生产环境中,监控和运维是不可或缺的。本章将集成Spring Boot Actuator,为日志框架添加健康检查、指标监控和运行时管理功能,让框架具备企业级的可观测性。

Actuator集成架构

graph LR A[Actuator端点] --> B[LogEndpoint] A --> C[LogHealthIndicator] B --> D[日志统计信息] C --> E[健康状态检查] E --> F[监控指标暴露]

集成要点:

  • 🏥 健康检查:监控框架运行状态
  • 📊 指标收集:统计日志调用次数、性能数据
  • ⚙️ 运行时管理:动态调整日志配置
  • 🔍 故障诊断:提供调试和排错信息

LogHealthIndicator - 健康检查指标

java 复制代码
package com.simpleflow.log.springboot.actuator;

import com.simpleflow.log.context.ThreadLocalTraceHolder;
import com.simpleflow.log.processor.AnnotationConfigResolver;
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.ThreadMXBean;

/**
 * 日志框架健康检查指标
 * 
 * 监控框架的运行状态、性能指标和资源使用情况
 */
@Component
public class LogHealthIndicator implements HealthIndicator {
    
    private final AnnotationConfigResolver configResolver;
    private volatile long lastCheckTime = System.currentTimeMillis();
    private volatile boolean lastCheckResult = true;
    
    public LogHealthIndicator(AnnotationConfigResolver configResolver) {
        this.configResolver = configResolver;
    }
    
    @Override
    public Health health() {
        try {
            Health.Builder builder = new Health.Builder();
            
            // 检查核心组件状态
            boolean coreHealthy = checkCoreComponents();
            
            // 检查内存使用情况
            MemoryStatus memoryStatus = checkMemoryUsage();
            
            // 检查线程状态
            ThreadStatus threadStatus = checkThreadStatus();
            
            // 检查配置缓存状态
            CacheStatus cacheStatus = checkCacheStatus();
            
            // 综合判断健康状态
            boolean isHealthy = coreHealthy && 
                              memoryStatus.isHealthy() && 
                              threadStatus.isHealthy() &&
                              cacheStatus.isHealthy();
            
            if (isHealthy) {
                builder.up();
            } else {
                builder.down();
            }
            
            // 添加详细信息
            builder.withDetail("core", coreHealthy ? "UP" : "DOWN")
                   .withDetail("memory", memoryStatus)
                   .withDetail("threads", threadStatus)
                   .withDetail("cache", cacheStatus)
                   .withDetail("lastCheckTime", lastCheckTime)
                   .withDetail("uptime", getUptimeInfo());
            
            lastCheckTime = System.currentTimeMillis();
            lastCheckResult = isHealthy;
            
            return builder.build();
            
        } catch (Exception e) {
            return Health.down()
                        .withDetail("error", e.getMessage())
                        .withDetail("lastCheckTime", lastCheckTime)
                        .build();
        }
    }
    
    /**
     * 检查核心组件状态
     */
    private boolean checkCoreComponents() {
        try {
            // 检查配置解析器是否正常
            if (configResolver == null) {
                return false;
            }
            
            // 检查ThreadLocal是否可以正常工作
            ThreadLocalTraceHolder.initTrace();
            boolean hasContext = ThreadLocalTraceHolder.getCurrentTrace() != null;
            ThreadLocalTraceHolder.clearCurrentTrace();
            
            return hasContext;
            
        } catch (Exception e) {
            return false;
        }
    }
    
    /**
     * 检查内存使用情况
     */
    private MemoryStatus checkMemoryUsage() {
        MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
        
        long usedMemory = memoryBean.getHeapMemoryUsage().getUsed();
        long maxMemory = memoryBean.getHeapMemoryUsage().getMax();
        
        double usagePercentage = maxMemory > 0 ? (double) usedMemory / maxMemory * 100 : 0;
        
        return new MemoryStatus(usedMemory, maxMemory, usagePercentage);
    }
    
    /**
     * 检查线程状态
     */
    private ThreadStatus checkThreadStatus() {
        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        
        int threadCount = threadBean.getThreadCount();
        int daemonThreadCount = threadBean.getDaemonThreadCount();
        long totalStartedThreadCount = threadBean.getTotalStartedThreadCount();
        
        return new ThreadStatus(threadCount, daemonThreadCount, totalStartedThreadCount);
    }
    
    /**
     * 检查配置缓存状态
     */
    private CacheStatus checkCacheStatus() {
        try {
            String cacheStats = configResolver.getCacheStats();
            return new CacheStatus(true, cacheStats);
        } catch (Exception e) {
            return new CacheStatus(false, "缓存状态检查失败: " + e.getMessage());
        }
    }
    
    /**
     * 获取运行时间信息
     */
    private String getUptimeInfo() {
        long uptime = ManagementFactory.getRuntimeMXBean().getUptime();
        long hours = uptime / (1000 * 60 * 60);
        long minutes = (uptime % (1000 * 60 * 60)) / (1000 * 60);
        long seconds = (uptime % (1000 * 60)) / 1000;
        
        return String.format("%d小时%d分钟%d秒", hours, minutes, seconds);
    }
    
    // ========== 内部状态类 ==========
    
    public static class MemoryStatus {
        private final long usedMemory;
        private final long maxMemory;
        private final double usagePercentage;
        
        public MemoryStatus(long usedMemory, long maxMemory, double usagePercentage) {
            this.usedMemory = usedMemory;
            this.maxMemory = maxMemory;
            this.usagePercentage = usagePercentage;
        }
        
        public boolean isHealthy() {
            return usagePercentage < 90.0; // 内存使用率低于90%认为健康
        }
        
        public long getUsedMemory() { return usedMemory; }
        public long getMaxMemory() { return maxMemory; }
        public double getUsagePercentage() { return usagePercentage; }
        
        @Override
        public String toString() {
            return String.format("已使用: %dMB, 最大: %dMB, 使用率: %.2f%%", 
                usedMemory / 1024 / 1024, 
                maxMemory / 1024 / 1024, 
                usagePercentage);
        }
    }
    
    public static class ThreadStatus {
        private final int threadCount;
        private final int daemonThreadCount;
        private final long totalStartedThreadCount;
        
        public ThreadStatus(int threadCount, int daemonThreadCount, long totalStartedThreadCount) {
            this.threadCount = threadCount;
            this.daemonThreadCount = daemonThreadCount;
            this.totalStartedThreadCount = totalStartedThreadCount;
        }
        
        public boolean isHealthy() {
            return threadCount < 1000; // 线程数少于1000认为健康
        }
        
        public int getThreadCount() { return threadCount; }
        public int getDaemonThreadCount() { return daemonThreadCount; }
        public long getTotalStartedThreadCount() { return totalStartedThreadCount; }
        
        @Override
        public String toString() {
            return String.format("当前线程: %d, 守护线程: %d, 总启动线程: %d", 
                threadCount, daemonThreadCount, totalStartedThreadCount);
        }
    }
    
    public static class CacheStatus {
        private final boolean healthy;
        private final String stats;
        
        public CacheStatus(boolean healthy, String stats) {
            this.healthy = healthy;
            this.stats = stats;
        }
        
        public boolean isHealthy() { return healthy; }
        public String getStats() { return stats; }
        
        @Override
        public String toString() {
            return stats;
        }
    }
}

LogEndpoint - 自定义端点

java 复制代码
package com.simpleflow.log.springboot.actuator;

import com.simpleflow.log.config.LogConfig;
import com.simpleflow.log.processor.AnnotationConfigResolver;
import com.simpleflow.log.springboot.properties.LogProperties;
import org.springframework.boot.actuate.endpoint.annotation.Endpoint;
import org.springframework.boot.actuate.endpoint.annotation.ReadOperation;
import org.springframework.boot.actuate.endpoint.annotation.WriteOperation;
import org.springframework.stereotype.Component;

import java.time.LocalDateTime;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.atomic.AtomicLong;

/**
 * 日志框架自定义端点
 * 
 * 提供日志统计信息、配置查看和运行时管理功能
 */
@Component
@Endpoint(id = "simpleflow-log")
public class LogEndpoint {
    
    private final LogProperties logProperties;
    private final AnnotationConfigResolver configResolver;
    private final LogConfig defaultLogConfig;
    
    // 统计信息
    private final AtomicLong totalLogCount = new AtomicLong(0);
    private final AtomicLong errorLogCount = new AtomicLong(0);
    private final AtomicLong cacheHitCount = new AtomicLong(0);
    private final AtomicLong cacheMissCount = new AtomicLong(0);
    
    private volatile LocalDateTime startTime = LocalDateTime.now();
    
    public LogEndpoint(LogProperties logProperties, 
                      AnnotationConfigResolver configResolver,
                      LogConfig defaultLogConfig) {
        this.logProperties = logProperties;
        this.configResolver = configResolver;
        this.defaultLogConfig = defaultLogConfig;
    }
    
    /**
     * 获取日志框架状态信息
     */
    @ReadOperation
    public Map<String, Object> info() {
        Map<String, Object> info = new HashMap<>();
        
        // 基本信息
        info.put("framework", "SimpleFlow Log Framework");
        info.put("version", "1.0.0");
        info.put("startTime", startTime);
        info.put("status", "RUNNING");
        
        // 配置信息
        info.put("configuration", getConfigurationInfo());
        
        // 统计信息
        info.put("statistics", getStatisticsInfo());
        
        // 性能信息
        info.put("performance", getPerformanceInfo());
        
        // 缓存信息
        info.put("cache", getCacheInfo());
        
        return info;
    }
    
    /**
     * 获取配置信息
     */
    @ReadOperation
    public Map<String, Object> config() {
        Map<String, Object> config = new HashMap<>();
        
        config.put("enabled", logProperties.isEnabled());
        config.put("defaultLevel", logProperties.getDefaultLevel());
        config.put("webEnabled", logProperties.isWebEnabled());
        config.put("logArgs", logProperties.isLogArgs());
        config.put("logResult", logProperties.isLogResult());
        config.put("logExecutionTime", logProperties.isLogExecutionTime());
        config.put("globalSensitiveFields", logProperties.getGlobalSensitiveFields());
        
        // 请求日志配置
        Map<String, Object> requestLogConfig = new HashMap<>();
        LogProperties.RequestLog requestLog = logProperties.getRequestLog();
        requestLogConfig.put("enabled", requestLog.isEnabled());
        requestLogConfig.put("logHeaders", requestLog.isLogHeaders());
        requestLogConfig.put("logParameters", requestLog.isLogParameters());
        requestLogConfig.put("excludePatterns", requestLog.getExcludePatterns());
        config.put("requestLog", requestLogConfig);
        
        // 性能配置
        Map<String, Object> performanceConfig = new HashMap<>();
        LogProperties.Performance performance = logProperties.getPerformance();
        performanceConfig.put("asyncEnabled", performance.isAsyncEnabled());
        performanceConfig.put("asyncQueueSize", performance.getAsyncQueueSize());
        performanceConfig.put("cacheSize", performance.getCacheSize());
        performanceConfig.put("maxLogLength", performance.getMaxLogLength());
        config.put("performance", performanceConfig);
        
        return config;
    }
    
    /**
     * 获取统计信息
     */
    @ReadOperation
    public Map<String, Object> stats() {
        return getStatisticsInfo();
    }
    
    /**
     * 清除缓存
     */
    @WriteOperation
    public Map<String, Object> clearCache() {
        try {
            configResolver.clearAllCache();
            
            Map<String, Object> result = new HashMap<>();
            result.put("success", true);
            result.put("message", "缓存清除成功");
            result.put("timestamp", LocalDateTime.now());
            
            return result;
            
        } catch (Exception e) {
            Map<String, Object> result = new HashMap<>();
            result.put("success", false);
            result.put("message", "缓存清除失败: " + e.getMessage());
            result.put("timestamp", LocalDateTime.now());
            
            return result;
        }
    }
    
    /**
     * 重置统计信息
     */
    @WriteOperation
    public Map<String, Object> resetStats() {
        totalLogCount.set(0);
        errorLogCount.set(0);
        cacheHitCount.set(0);
        cacheMissCount.set(0);
        startTime = LocalDateTime.now();
        
        Map<String, Object> result = new HashMap<>();
        result.put("success", true);
        result.put("message", "统计信息重置成功");
        result.put("timestamp", LocalDateTime.now());
        
        return result;
    }
    
    // ========== 私有方法 ==========
    
    private Map<String, Object> getConfigurationInfo() {
        Map<String, Object> config = new HashMap<>();
        config.put("enabled", logProperties.isEnabled());
        config.put("defaultLevel", logProperties.getDefaultLevel());
        config.put("webEnabled", logProperties.isWebEnabled());
        config.put("actuatorEnabled", logProperties.isActuatorEnabled());
        return config;
    }
    
    private Map<String, Object> getStatisticsInfo() {
        Map<String, Object> stats = new HashMap<>();
        stats.put("totalLogCount", totalLogCount.get());
        stats.put("errorLogCount", errorLogCount.get());
        stats.put("successRate", calculateSuccessRate());
        stats.put("startTime", startTime);
        stats.put("uptime", calculateUptime());
        return stats;
    }
    
    private Map<String, Object> getPerformanceInfo() {
        Map<String, Object> performance = new HashMap<>();
        
        // 获取JVM性能信息
        Runtime runtime = Runtime.getRuntime();
        performance.put("totalMemory", runtime.totalMemory());
        performance.put("freeMemory", runtime.freeMemory());
        performance.put("maxMemory", runtime.maxMemory());
        performance.put("usedMemory", runtime.totalMemory() - runtime.freeMemory());
        
        // 计算内存使用率
        double memoryUsage = (double) (runtime.totalMemory() - runtime.freeMemory()) / runtime.maxMemory() * 100;
        performance.put("memoryUsagePercentage", String.format("%.2f%%", memoryUsage));
        
        return performance;
    }
    
    private Map<String, Object> getCacheInfo() {
        Map<String, Object> cache = new HashMap<>();
        cache.put("stats", configResolver.getCacheStats());
        cache.put("hitCount", cacheHitCount.get());
        cache.put("missCount", cacheMissCount.get());
        cache.put("hitRate", calculateHitRate());
        return cache;
    }
    
    private String calculateSuccessRate() {
        long total = totalLogCount.get();
        if (total == 0) {
            return "0.00%";
        }
        double rate = (double) (total - errorLogCount.get()) / total * 100;
        return String.format("%.2f%%", rate);
    }
    
    private String calculateUptime() {
        LocalDateTime now = LocalDateTime.now();
        long minutes = java.time.Duration.between(startTime, now).toMinutes();
        long hours = minutes / 60;
        long remainingMinutes = minutes % 60;
        
        return String.format("%d小时%d分钟", hours, remainingMinutes);
    }
    
    private String calculateHitRate() {
        long total = cacheHitCount.get() + cacheMissCount.get();
        if (total == 0) {
            return "0.00%";
        }
        double rate = (double) cacheHitCount.get() / total * 100;
        return String.format("%.2f%%", rate);
    }
    
    // ========== 统计方法(供框架内部调用) ==========
    
    public void incrementLogCount() {
        totalLogCount.incrementAndGet();
    }
    
    public void incrementErrorCount() {
        errorLogCount.incrementAndGet();
    }
    
    public void incrementCacheHit() {
        cacheHitCount.incrementAndGet();
    }
    
    public void incrementCacheMiss() {
        cacheMissCount.incrementAndGet();
    }
}

LogMetrics - 指标收集器

java 复制代码
package com.simpleflow.log.springboot.actuator;

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.springframework.boot.autoconfigure.condition.ConditionalOnClass;
import org.springframework.stereotype.Component;

import java.util.concurrent.atomic.AtomicLong;

/**
 * 日志框架指标收集器
 * 
 * 集成Micrometer,提供丰富的监控指标
 */
@Component
@ConditionalOnClass(MeterRegistry.class)
public class LogMetrics {
    
    private final MeterRegistry meterRegistry;
    
    // 计数器
    private final Counter logMethodCallCounter;
    private final Counter logErrorCounter;
    private final Counter cacheHitCounter;
    private final Counter cacheMissCounter;
    
    // 计时器
    private final Timer logExecutionTimer;
    private final Timer configResolveTimer;
    
    // 仪表
    private final AtomicLong activeLogContexts;
    private final AtomicLong cacheSize;
    
    public LogMetrics(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        
        // 初始化计数器
        this.logMethodCallCounter = Counter.builder("simpleflow.log.method.calls")
                .description("Total number of log method calls")
                .register(meterRegistry);
        
        this.logErrorCounter = Counter.builder("simpleflow.log.errors")
                .description("Total number of log errors")
                .register(meterRegistry);
        
        this.cacheHitCounter = Counter.builder("simpleflow.log.cache.hits")
                .description("Number of cache hits")
                .register(meterRegistry);
        
        this.cacheMissCounter = Counter.builder("simpleflow.log.cache.misses")
                .description("Number of cache misses")
                .register(meterRegistry);
        
        // 初始化计时器
        this.logExecutionTimer = Timer.builder("simpleflow.log.execution.time")
                .description("Log execution time")
                .register(meterRegistry);
        
        this.configResolveTimer = Timer.builder("simpleflow.log.config.resolve.time")
                .description("Configuration resolve time")
                .register(meterRegistry);
        
        // 初始化仪表
        this.activeLogContexts = new AtomicLong(0);
        this.cacheSize = new AtomicLong(0);
        
        Gauge.builder("simpleflow.log.contexts.active")
             .description("Number of active log contexts")
             .register(meterRegistry, activeLogContexts, AtomicLong::get);
        
        Gauge.builder("simpleflow.log.cache.size")
             .description("Current cache size")
             .register(meterRegistry, cacheSize, AtomicLong::get);
    }
    
    // ========== 指标记录方法 ==========
    
    public void recordMethodCall() {
        logMethodCallCounter.increment();
    }
    
    public void recordError() {
        logErrorCounter.increment();
    }
    
    public void recordCacheHit() {
        cacheHitCounter.increment();
    }
    
    public void recordCacheMiss() {
        cacheMissCounter.increment();
    }
    
    public Timer.Sample startExecutionTimer() {
        return Timer.start(meterRegistry);
    }
    
    public void stopExecutionTimer(Timer.Sample sample) {
        sample.stop(logExecutionTimer);
    }
    
    public Timer.Sample startConfigResolveTimer() {
        return Timer.start(meterRegistry);
    }
    
    public void stopConfigResolveTimer(Timer.Sample sample) {
        sample.stop(configResolveTimer);
    }
    
    public void setActiveContexts(long count) {
        activeLogContexts.set(count);
    }
    
    public void setCacheSize(long size) {
        cacheSize.set(size);
    }
    
    public void incrementActiveContexts() {
        activeLogContexts.incrementAndGet();
    }
    
    public void decrementActiveContexts() {
        activeLogContexts.decrementAndGet();
    }
}

使用和测试

1. 配置启用

yaml 复制代码
# application.yml
management:
  endpoints:
    web:
      exposure:
        include: health,info,simpleflow-log,metrics
  endpoint:
    health:
      show-details: always
  health:
    log:
      enabled: true

simpleflow:
  log:
    actuator-enabled: true

2. 访问端点

bash 复制代码
# 健康检查
curl http://localhost:8080/actuator/health/log

# 查看日志框架信息
curl http://localhost:8080/actuator/simpleflow-log

# 查看配置
curl http://localhost:8080/actuator/simpleflow-log/config

# 查看统计信息
curl http://localhost:8080/actuator/simpleflow-log/stats

# 清除缓存
curl -X POST http://localhost:8080/actuator/simpleflow-log/clearCache

# 查看Prometheus指标
curl http://localhost:8080/actuator/metrics/simpleflow.log.method.calls

3. 输出示例

健康检查输出:

json 复制代码
{
  "status": "UP",
  "details": {
    "core": "UP",
    "memory": "已使用: 256MB, 最大: 1024MB, 使用率: 25.00%",
    "threads": "当前线程: 45, 守护线程: 12, 总启动线程: 67",
    "cache": "MethodCache: 150, ClassCache: 25",
    "lastCheckTime": 1692766815123,
    "uptime": "2小时15分钟30秒"
  }
}

框架信息输出:

json 复制代码
{
  "framework": "SimpleFlow Log Framework",
  "version": "1.0.0",
  "startTime": "2024-08-23T08:30:15",
  "status": "RUNNING",
  "configuration": {
    "enabled": true,
    "defaultLevel": "INFO",
    "webEnabled": true,
    "actuatorEnabled": true
  },
  "statistics": {
    "totalLogCount": 1250,
    "errorLogCount": 3,
    "successRate": "99.76%",
    "uptime": "2小时15分钟"
  },
  "performance": {
    "totalMemory": 268435456,
    "freeMemory": 201326592,
    "usedMemory": 67108864,
    "memoryUsagePercentage": "25.00%"
  },
  "cache": {
    "stats": "MethodCache: 150, ClassCache: 25",
    "hitCount": 980,
    "missCount": 195,
    "hitRate": "83.40%"
  }
}

本章小结

✅ 完成的任务

  1. 健康检查:实现了LogHealthIndicator监控框架状态
  2. 自定义端点:创建了LogEndpoint提供管理功能
  3. 指标收集:集成Micrometer收集性能指标
  4. 运行时管理:支持缓存清理、统计重置等操作
  5. 监控集成:完整的Actuator集成方案

🎯 学习要点

  • Actuator扩展的正确方式
  • 健康检查指标的设计原则
  • 自定义端点的实现技巧
  • 指标收集与监控系统的集成
  • 运行时管理功能的安全考虑

💡 思考题

  1. 如何设计更细粒度的健康检查?
  2. 监控指标的报警阈值如何确定?
  3. 如何保护管理端点的安全性?

🚀 下章预告

最后一章我们将构建完整的示例应用,整合所有功能模块,并进行全面的测试验证,展示框架在实际项目中的应用效果。


💡 设计原则 : 优秀的监控系统应该是主动发现问题、提供详实信息、支持快速响应的。通过Actuator集成,我们让框架具备了生产级的可观测性。

相关推荐
李慕婉学姐3 小时前
【开题答辩过程】以《基于JAVA的校园即时配送系统的设计与实现》为例,不知道这个选题怎么做的,不知道这个选题怎么开题答辩的可以进来看看
java·开发语言·数据库
奋进的芋圆5 小时前
Java 延时任务实现方案详解(适用于 Spring Boot 3)
java·spring boot·redis·rabbitmq
sxlishaobin5 小时前
设计模式之桥接模式
java·设计模式·桥接模式
model20055 小时前
alibaba linux3 系统盘网站迁移数据盘
java·服务器·前端
荒诞硬汉5 小时前
JavaBean相关补充
java·开发语言
提笔忘字的帝国5 小时前
【教程】macOS 如何完全卸载 Java 开发环境
java·开发语言·macos
2501_941882486 小时前
从灰度发布到流量切分的互联网工程语法控制与多语言实现实践思路随笔分享
java·开发语言
華勳全栈6 小时前
两天开发完成智能体平台
java·spring·go
alonewolf_996 小时前
Spring MVC重点功能底层源码深度解析
java·spring·mvc
沛沛老爹6 小时前
Java泛型擦除:原理、实践与应对策略
java·开发语言·人工智能·企业开发·发展趋势·技术原理