Spring Boot Actuator 深度解析:健康检查、指标暴露与端点安全
Spring Boot Actuator 是生产环境监控的利器,提供健康检查、指标采集、端点管理等核心能力。本文将深入探讨其三大核心功能及生产级安全实践。
一、健康检查机制:从原理到自定义
1.1 核心架构与工作原理
健康检查基于 HealthContributor 体系,由 HealthEndpoint 聚合所有 HealthIndicator 的检查结果。
核心组件:
- HealthIndicator接口 :所有健康检查的基石,定义
health()方法返回Health对象 - HealthContributorRegistry :中央注册表,管理所有
HealthIndicator生命周期 - HealthEndpoint :暴露
/actuator/health端点,负责聚合结果 - OrderedHealthAggregator:采用"最差状态优先"原则聚合(只要一个组件 DOWN,整体 DOWN)
默认健康状态枚举:
- UP:组件正常(HTTP 200)
- DOWN:组件故障(HTTP 503)
- UNKNOWN:状态不确定(HTTP 503)
- OUT_OF_SERVICE:服务不可用但无故障(HTTP 200)
1.2 内置HealthIndicator
Spring Boot 预置 20+ 健康检查器,自动检测类路径并注册:
| 名称 | 触发条件 | 检查内容 |
|---|---|---|
DataSourceHealthIndicator |
存在 DataSource | 数据库连接有效性 |
RedisHealthIndicator |
存在 RedisConnectionFactory | Redis 连通性 |
DiskSpaceHealthIndicator |
始终启用 | 磁盘剩余空间(默认阈值 10MB) |
PingHealthIndicator |
始终启用 | 基础存活检查 |
MailHealthIndicator |
存在 MailSender | SMTP 服务器连通性 |
配置示例:
yaml
management:
endpoint:
health:
show-details: when-authorized # 详细信息仅授权用户可见
show-components: always # 始终显示组件状态
probes:
enabled: true # 启用 k8s 存活/就绪探针
health:
diskspace:
threshold: 50MB # 自定义磁盘阈值
redis:
enabled: true # 显式启用 Redis 检查
db:
enabled: true
validation-query: SELECT 1 # 自定义 SQL 验证
1.3 自定义HealthIndicator
场景:检查核心业务逻辑(如订单处理状态)
java
@Component
public class OrderProcessingHealthIndicator implements HealthIndicator {
private final OrderRepository orderRepository;
@Override
public Health health() {
try {
// 检查待处理订单数量
long pendingCount = orderRepository.countByStatus(OrderStatus.PENDING);
if (pendingCount > 10000) {
return Health.down()
.status("DEGRADED")
.withDetail("pendingOrders", pendingCount)
.withDetail("threshold", 10000)
.withDetail("message", "待处理订单积压过多")
.build();
}
return Health.up()
.withDetail("pendingOrders", pendingCount)
.withDetail("timestamp", System.currentTimeMillis())
.build();
} catch (Exception e) {
return Health.down(e)
.withDetail("error", e.getMessage())
.build();
}
}
}
// 访问 /actuator/health 响应
{
"status": "DEGRADED",
"components": {
"orderProcessing": {
"status": "DEGRADED",
"details": {
"pendingOrders": 12500,
"threshold": 10000,
"message": "待处理订单积压过多"
}
},
"db": { "status": "UP" },
"diskSpace": { "status": "UP" }
}
}
1.4 Kubernetes 探针支持
Spring Boot 3.x 原生支持 k8s 存活(liveness)和就绪(readiness)探针:
yaml
management:
health:
livenessstate:
enabled: true # 存活探针 (进程是否运行)
readinessstate:
enabled: true # 就绪探针 (是否可接收流量)
# k8s Deployment 配置
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
二、指标暴露:Micrometer 与三种采集模式
2.1 Micrometer 集成架构
Micrometer 是指标门面,Spring Boot Actuator 通过自动配置 MeterRegistry 实现指标暴露。
核心组件:
- MeterRegistry:指标注册中心,统一管理
- PrometheusMeterRegistry:转换为 Prometheus 格式
- CompositeMeterRegistry:聚合多个 Registry(Spring Boot 默认使用)
- @ConditionalOnMissingBean:确保全局只有一个主 Registry
自动配置类 :MeterRegistryAutoConfiguration
2.2 三种指标采集模式对比
场景:Push(推送)vs Pull(拉取)系统
| 模式 | Step(Delta)模式 | Cumulative(累计)模式 | Pull 模式 |
|---|---|---|---|
| 值含义 | 当前时间窗的增量 | 从启动至今的总和 | 按需拉取的快照 |
| 是否清零 | ✅ 周期结束后清零 | ❌ 持续累计 | ❌ 每次都重新计算 |
| 适用场景 | Push 型系统(Kafka、数据库) | Prometheus 抓取 | 监控系统主动拉取 |
| 数据延迟影响 | 延迟导致丢失当前周期数据 | 延迟只影响分辨率 | 延迟只影响分辨率 |
| 计算速率 | 注册表已计算差值 | 采集端通过 rate() 计算 |
采集端处理 |
| 数据库压力 | 极小(已预处理) | 大(需窗口函数) | 中等 |
| 配置方式 | management.metrics.export.*.step=60s |
默认模式 | /actuator/prometheus 端点 |
配置示例(Push 到 Kafka):
yaml
management:
metrics:
export:
kafka:
enabled: true
topic: metrics-topic
step: 60s # 每60秒推送一次增量数据
distribution:
percentiles-histogram:
http.server.requests: true
2.3 Prometheus 拉取模式配置
步骤 1:添加依赖
xml
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
步骤 2:暴露端点
yaml
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus,info
endpoint:
prometheus:
enabled: true
metrics:
export:
prometheus:
enabled: true
descriptions: true # 添加指标描述
步骤 3:Kubernetes ServiceMonitor 配置
yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-service-monitor
namespace: production
spec:
selector:
matchLabels:
app: my-service
endpoints:
- port: http
interval: 30s
path: /actuator/prometheus # 采集路径
honorLabels: true
- port: http
interval: 30s
path: /actuator/prometheus # 多实例采集
验证指标:
bash
# 查看所有指标
curl http://localhost:8080/actuator/metrics
# 查看 Prometheus 格式
curl http://localhost:8080/actuator/prometheus
# 查看单个指标详情
curl http://localhost:8080/actuator/metrics/http.server.requests
2.4 自定义指标实现
场景:统计订单处理速率
java
@Service
public class OrderMetrics {
private final Counter orderCounter;
private final Timer orderTimer;
private final DistributionSummary amountSummary;
public OrderMetrics(MeterRegistry registry) {
// 计数器:订单总数
this.orderCounter = Counter.builder("order.processed.total")
.description("Total processed orders")
.tag("type", "online")
.register(registry);
// 计时器:订单处理耗时
this.orderTimer = Timer.builder("order.process.duration")
.description("Order processing duration")
.publishPercentiles(0.5, 0.95, 0.99) // P50, P95, P99
.register(registry);
// 分布摘要:订单金额分布
this.amountSummary = DistributionSummary.builder("order.amount.distribution")
.description("Order amount distribution")
.baseUnit("yuan")
.scale(100) // 以分为单位存储
.register(registry);
}
public void recordOrder(Order order) {
// 计数
orderCounter.increment();
// 记录耗时
orderTimer.record(Duration.ofMillis(order.getProcessingTime()));
// 记录金额分布
amountSummary.record(order.getAmount().doubleValue());
}
}
// Controller 中使用
@RestController
public class OrderController {
private final OrderMetrics metrics;
@PostMapping
public ResponseEntity<Order> create(@RequestBody OrderDTO dto) {
return metrics.orderTimer.recordCallable(() -> {
Order order = orderService.create(dto);
metrics.recordOrder(order);
return ResponseEntity.ok(order);
});
}
}
Prometheus 输出示例:
# HELP order_processed_total Total processed orders
# TYPE order_processed_total counter
order_processed_total{type="online"} 1250
# HELP order_process_duration_seconds Order processing duration
# TYPE order_process_duration_seconds summary
order_process_duration_seconds{quantile="0.5"} 0.023
order_process_duration_seconds{quantile="0.95"} 0.087
order_process_duration_seconds{quantile="0.99"} 0.156
# HELP order_amount_distribution_yuan Order amount distribution
# TYPE order_amount_distribution_yuan histogram
order_amount_distribution_yuan_bucket{le="100"} 450
order_amount_distribution_yuan_bucket{le="500"} 890
order_amount_distribution_yuan_bucket{le="1000"} 1200
order_amount_distribution_yuan_bucket{le="+Inf"} 1250
三、端点安全控制:生产级防护
3.1 风险分析:敏感端点暴露
高危端点列表:
/actuator/env:暴露环境变量(含密码)/actuator/beans:泄露 Bean 结构/actuator/heapdump:内存转储(可提取敏感数据)/actuator/threaddump:线程信息/actuator/mappings:所有 URL 映射
真实案例 :未防护的 /env 端点泄露数据库密码,导致数据泄露。
3.2 最小暴露原则
配置:仅暴露必要端点,其余全部禁用
yaml
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus # 仅暴露安全端点
exclude: env,beans,heapdump,mappings,configprops,threaddump
endpoint:
health:
show-details: when-authorized # 详细信息需授权
enabled: true
info:
enabled: true
env:
enabled: false # 完全禁用
beans:
enabled: false
heapdump:
enabled: false # 高危端点必须禁用
等效 Java 配置:
java
@Configuration
@Profile("prod")
public class ProductionActuatorConfig {
@Bean
public WebMvcEndpointHandlerMapping webMvcEndpointHandlerMapping(
EndpointMapping endpointMapping,
Collection<ExposableWebEndpoint> endpoints,
EndpointMediaTypes endpointMediaTypes) {
// 过滤危险端点
Set<String> dangerousEndpoints = Set.of(
"heapdump", "env", "beans", "mappings",
"configprops", "threaddump"
);
Collection<ExposableWebEndpoint> filtered = endpoints.stream()
.filter(endpoint -> !dangerousEndpoints.contains(endpoint.getEndpointId().toString()))
.collect(Collectors.toList());
return new WebMvcEndpointHandlerMapping(
endpointMapping,
filtered,
endpointMediaTypes,
null,
new EndpointLinksResolver(filtered)
);
}
}
3.3 Spring Security 精细化控制
场景:基于角色的端点访问控制
java
@Configuration
@EnableWebSecurity
public class ActuatorSecurityConfig {
@Bean
public SecurityFilterChain actuatorFilterChain(HttpSecurity http) throws Exception {
http
.securityMatcher(EndpointRequest.toAnyEndpoint()) // 仅匹配 Actuator 端点
.authorizeHttpRequests(auth -> auth
.requestMatchers(EndpointRequest.to("health", "info")).permitAll() // 公开
.requestMatchers(EndpointRequest.to("metrics", "prometheus")).hasRole("MONITOR") // 监控角色
.anyRequest().hasRole("ADMIN") // 其他端点需 ADMIN
)
.sessionManagement(session -> session
.sessionCreationPolicy(SessionCreationPolicy.STATELESS)
)
.httpBasic(withDefaults());
return http.build();
}
// 配置用户与角色
@Bean
public UserDetailsService userDetailsService() {
return new InMemoryUserDetailsManager(
User.withUsername("monitor")
.password("{noop}monitor123")
.roles("MONITOR")
.build(),
User.withUsername("admin")
.password("{noop}admin123")
.roles("ADMIN", "MONITOR")
.build()
);
}
}
等效 YML 配置:
yaml
spring:
security:
user:
name: admin
password: admin123
roles: ADMIN,MONITOR
management:
endpoint:
health:
roles: MONITOR # 访问 health 详情需 MONITOR 角色
metrics:
roles: MONITOR
prometheus:
roles: MONITOR
3.4 端口隔离:管理端口独立
场景:将 Actuator 端点暴露在独立端口,与业务端口隔离
yaml
server:
port: 8080 # 业务端口
management:
server:
port: 8081 # 管理端口独立
ssl:
enabled: true # 强制 HTTPS
key-store: classpath:keystore.jks
key-store-password: ${KEYSTORE_PASSWORD}
endpoints:
web:
base-path: /internal # 修改默认路径(别用 /actuator)
path-mapping:
health: status # 自定义端点名
metrics: performance
prometheus: metrics-data
效果:
- 业务接口:
http://app:8080/api/orders - 健康检查:
https://app:8081/internal/status - Prometheus:
https://app:8081/internal/metrics-data
防火墙策略:仅允许运维 IP 访问 8081 端口。
3.5 CORS 控制
场景:前端监控面板跨域访问 Actuator 端点
java
@Configuration
public class ActuatorCorsConfig implements WebMvcConfigurer {
@Override
public void addCorsMappings(CorsRegistry registry) {
registry.addMapping("/actuator/** ")
.allowedOrigins("https://monitor.company.com") // 仅允许监控域名
.allowedMethods("GET")
.allowedHeaders("Authorization")
.allowCredentials(true)
.maxAge(3600);
}
}
// 或 YML 配置
management:
endpoints:
web:
cors:
allowed-origins: https://monitor.company.com
allowed-methods: GET
allowed-headers: Authorization
allow-credentials: true
四、生产级最佳实践
4.1 多环境差异化配置
yaml
# application-dev.yml
management:
endpoints:
web:
exposure:
include: "*" # 开发环境暴露全部
endpoint:
health:
show-details: always
# application-prod.yml
management:
server:
port: 8081
ssl:
enabled: true
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
base-path: /sys/monitor
path-mapping:
health: status
endpoint:
health:
show-details: when-authorized
probes:
enabled: true
metrics:
enabled: true
prometheus:
enabled: true
env:
enabled: false
beans:
enabled: false
heapdump:
enabled: false
4.2 敏感信息脱敏
自定义 InfoContributor:
java
@Component
public class SafeInfoContributor implements InfoContributor {
@Override
public void contribute(Info.Builder builder) {
Map<String, Object> details = new LinkedHashMap<>();
details.put("app", "order-service");
details.put("version", "1.0.0");
details.put("env", System.getenv("SPRING_PROFILES_ACTIVE"));
// 不包含敏感信息(密码、密钥)
builder.withDetail("safe", details);
}
}
4.3 监控告警集成
健康检查告警(结合 Prometheus + Alertmanager):
yaml
# prometheus-alert.yml
groups:
- name: health-check
rules:
- alert: ServiceDown
expr: application_health_status != 1
for: 0s
labels:
severity: critical
annotations:
summary: "服务 {{ $labels.instance }} 健康检查失败"
自定义指标告警:
yaml
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "服务 {{ $labels.instance }} 5xx错误率超过10%"
五、总结与检查清单
5.1 核心要点回顾
| 功能 | 关键配置 | 安全建议 | 生产推荐 |
|---|---|---|---|
| 健康检查 | management.endpoint.health |
详情仅授权访问 | 启用 k8s 探针 |
| 指标暴露 | management.endpoints.web.exposure |
仅暴露必要端点 | 使用 Prometheus 拉取 |
| Micrometer | micrometer-registry-prometheus |
指标脱敏 | Step 模式 Push |
| 安全控制 | Spring Security @EnableWebSecurity |
角色/端口隔离 | 独立管理端口+HTTPS |
| CORS | management.endpoints.web.cors |
白名单域名 | 严格限制来源 |
5.2 生产部署检查清单
- 排除高危端点(env, beans, heapdump)
- 健康检查详情设为
when-authorized - 配置 Spring Security 角色控制
- 管理端口独立(8081)并启用 HTTPS
- 修改默认路径
/actuator→/sys/monitor - 配置 CORS 白名单
- 敏感字段脱敏(密码、密钥)
- 启用 k8s liveness/readiness 探针
- 配置 Prometheus ServiceMonitor
- 设置告警规则(健康状态、错误率)
一句话总结:Actuator 是生产监控的瑞士军刀,但必须"上锁"使用------最小暴露 + 严格认证 + 端口隔离,才能安全发挥其强大能力。