基于 Spring Boot 3.2.x 的 Actuator 监控指南:从健康检查到企业级监控体系
假设某天凌晨,生产环境突然告警,于是你迷迷糊糊的思考着:
- 不知道是哪个服务出问题 - 是网关、订单服务、还是支付服务?
- 不知道是哪个组件出问题 - 是数据库、Redis、还是MQ?
- 不知道问题有多严重 - 是性能下降,还是完全不可用?
经过20分钟的排查,才发现是Redis连接池耗尽。如果有完善的监控,这个问题5分钟就能定位。
这就是 Actuator 要解决的问题:为Spring Boot应用提供完整的生产就绪监控能力,让你:
- 实时了解应用健康状态
- 监控系统资源和性能指标
- 动态调整应用配置
- 快速定位和诊断问题
一、Spring Boot Actuator核心概念
1. 什么是Actuator?
Spring Boot Actuator为应用提供了生产就绪特性,通过HTTP或JMX端点暴露监控和管理功能。它主要包括:
- 健康检查 - 应用及其依赖的健康状态
- 指标收集 - JVM、系统、应用性能指标
- 信息暴露 - 应用配置、环境信息
- 操作管理 - 日志级别调整、关闭应用等
2. 端点(Endpoints)
Actuator通过端点暴露监控数据。Spring Boot 3.2.x内置了20+个端点:
| 端点 | 路径 | 描述 | 默认启用 |
|---|---|---|---|
| health | /actuator/health |
应用健康状态 | ✅ |
| info | /actuator/info |
应用自定义信息 | ❌ |
| metrics | /actuator/metrics |
应用指标 | ✅ |
| loggers | /actuator/loggers |
查看和修改日志级别 | ❌ |
| env | /actuator/env |
环境配置信息 | ❌ |
| beans | /actuator/beans |
所有Spring Beans | ❌ |
| mappings | /actuator/mappings |
URL映射信息 | ❌ |
| threaddump | /actuator/threaddump |
线程转储 | ❌ |
| heapdump | /actuator/heapdump |
堆转储 | ❌ |
| shutdown | /actuator/shutdown |
优雅关闭应用 | ❌ |
二、快速开始:基础配置
1. 添加依赖
xml
<!-- pom.xml -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Web支持(用于HTTP端点) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
2. 基础配置
yaml
# application.yml
spring:
application:
name: order-service
management:
# 端点基础路径
endpoints:
web:
base-path: /actuator
exposure:
# 暴露哪些端点(生产环境要严格控制)
include: "health,info,metrics,prometheus"
# 跨域配置
cors:
allowed-origins: "http://localhost:3000"
allowed-methods: "GET,POST"
# 端点启用配置
endpoint:
health:
enabled: true
show-details: when_authorized # 详细信息的显示策略
show-components: when_authorized
info:
enabled: true
metrics:
enabled: true
prometheus:
enabled: true
# 健康检查配置
health:
# 健康检查组
group:
readiness:
include: "db,redis,diskSpace"
additional-path: "readiness"
liveness:
include: "ping"
additional-path: "liveness"
# 默认健康检查
defaults:
enabled: true
# 自定义健康检查
redis:
enabled: true
db:
enabled: true
diskspace:
enabled: true
threshold: 10MB
# 指标配置
metrics:
export:
prometheus:
enabled: true
step: 1m
enable:
jvm: true
system: true
logback: true
process: true
distribution:
percentiles-histogram:
http.server.requests: true
tags:
application: ${spring.application.name}
environment: ${spring.profiles.active:default}
web:
server:
request:
autotime:
enabled: true
3. 验证基础配置
启动应用后,访问以下端点:
bash
# 健康检查
curl http://localhost:8080/actuator/health
# 应用信息
curl http://localhost:8080/actuator/info
# 所有指标
curl http://localhost:8080/actuator/metrics
# 特定指标
curl http://localhost:8080/actuator/metrics/jvm.memory.used
响应示例:
json
// GET /actuator/health
{
"status": "UP",
"components": {
"db": {
"status": "UP",
"details": {
"database": "MySQL",
"validationQuery": "isValid()"
}
},
"diskSpace": {
"status": "UP",
"details": {
"total": 500068036608,
"free": 350067040256,
"threshold": 10485760,
"exists": true
}
},
"ping": {
"status": "UP"
}
}
}
三、核心端点深度解析
1. 健康检查端点(Health)
健康检查是微服务架构中的生命线。Spring Boot 3.2改进了健康检查机制。
1.1 内置健康指示器
java
// 演示内置健康检查
@Component
@Slf4j
public class HealthCheckDemo {
@EventListener(ApplicationReadyEvent.class)
public void showHealthIndicators(ApplicationReadyEvent event) {
ApplicationContext context = event.getApplicationContext();
HealthEndpoint healthEndpoint = context.getBean(HealthEndpoint.class);
// 获取健康状态
HealthComponent health = healthEndpoint.health();
log.info("应用整体状态: {}", health.getStatus());
// 获取所有健康指示器
Map<String, HealthComponent> components = health.getComponents();
log.info("健康指示器数量: {}", components.size());
components.forEach((name, component) -> {
log.info("指示器: {} - 状态: {}", name, component.getStatus());
});
}
}
运行结果:
2024-05-20 10:00:00.000 INFO c.e.demo.HealthCheckDemo : 应用整体状态: UP
2024-05-20 10:00:00.001 INFO c.e.demo.HealthCheckDemo : 健康指示器数量: 8
2024-05-20 10:00:00.002 INFO c.e.demo.HealthCheckDemo : 指示器: db - 状态: UP
2024-05-20 10:00:00.003 INFO c.e.demo.HealthCheckDemo : 指示器: diskSpace - 状态: UP
2024-05-20 10:00:00.004 INFO c.e.demo.HealthCheckDemo : 指示器: ping - 状态: UP
2024-05-20 10:00:00.005 INFO c.e.demo.HealthCheckDemo : 指示器: redis - 状态: UP
1.2 自定义健康指示器
java
// 自定义Redis健康指示器
@Component
@Slf4j
public class CustomRedisHealthIndicator implements HealthIndicator {
private final RedisTemplate<String, String> redisTemplate;
private final StringRedisConnectionFactory connectionFactory;
public CustomRedisHealthIndicator(RedisTemplate<String, String> redisTemplate,
RedisConnectionFactory connectionFactory) {
this.redisTemplate = redisTemplate;
this.connectionFactory = (StringRedisConnectionFactory) connectionFactory;
}
@Override
public Health health() {
try {
// 1. 检查连接
long start = System.currentTimeMillis();
String result = redisTemplate.execute((RedisCallback<String>) connection ->
connection.ping());
long responseTime = System.currentTimeMillis() - start;
if (!"PONG".equals(result)) {
return Health.down()
.withDetail("error", "Redis响应异常: " + result)
.build();
}
// 2. 检查内存使用率
Properties info = redisTemplate.getConnectionFactory()
.getConnection().info("memory");
long usedMemory = Long.parseLong(info.getProperty("used_memory"));
long maxMemory = Long.parseLong(info.getProperty("maxmemory"));
double memoryUsage = maxMemory > 0 ?
(double) usedMemory / maxMemory * 100 : 0;
// 3. 检查连接数
int connectedClients = Integer.parseInt(info.getProperty("connected_clients"));
// 构建健康状态
Health.Builder builder = Health.up()
.withDetail("response_time", responseTime + "ms")
.withDetail("memory_usage", String.format("%.2f%%", memoryUsage))
.withDetail("connected_clients", connectedClients)
.withDetail("version", info.getProperty("redis_version"));
// 添加警告
if (responseTime > 100) {
builder.withDetail("warning", "响应时间较慢");
}
if (memoryUsage > 80) {
builder.withDetail("warning", "内存使用率过高");
}
return builder.build();
} catch (Exception e) {
log.error("Redis健康检查失败", e);
return Health.down(e)
.withDetail("error", e.getMessage())
.build();
}
}
}
注册自定义指示器:
java
@Configuration
public class HealthIndicatorConfig {
@Bean
@ConditionalOnBean(RedisTemplate.class)
public HealthIndicator customRedisHealthIndicator(
RedisTemplate<String, String> redisTemplate,
RedisConnectionFactory connectionFactory) {
return new CustomRedisHealthIndicator(redisTemplate, connectionFactory);
}
// 数据库连接池健康检查
@Bean
@ConditionalOnBean(DataSource.class)
public HealthIndicator datasourceHealthIndicator(DataSource dataSource) {
return () -> {
try (Connection conn = dataSource.getConnection()) {
boolean isValid = conn.isValid(5); // 5秒超时
if (dataSource instanceof HikariDataSource hikari) {
HikariPoolMXBean pool = hikari.getHikariPoolMXBean();
return Health.up()
.withDetail("active_connections", pool.getActiveConnections())
.withDetail("idle_connections", pool.getIdleConnections())
.withDetail("total_connections", pool.getTotalConnections())
.withDetail("threads_waiting", pool.getThreadsAwaitingConnection())
.withDetail("validation_timeout", "5s")
.build();
}
return Health.up()
.withDetail("validation", isValid)
.build();
} catch (Exception e) {
return Health.down(e)
.withDetail("error", e.getMessage())
.build();
}
};
}
}
访问测试:
bash
curl http://localhost:8080/actuator/health/redis
响应示例:
json
{
"status": "UP",
"details": {
"response_time": "15ms",
"memory_usage": "45.23%",
"connected_clients": 12,
"version": "7.0.0"
}
}
2. 指标端点(Metrics)
Spring Boot使用Micrometer作为指标门面,支持多种监控系统。
2.1 核心指标分类
yaml
# metrics配置详解
management:
metrics:
# JVM指标
enable:
jvm: true
jvm.memory: true
jvm.gc: true
jvm.threads: true
jvm.classes: true
# 系统指标
system:
cpu: true
disk: true
uptime: true
# 应用指标
application:
http:
server:
requests: true
client:
requests: true
cache: true
data:
source: true
jms: true
kafka: true
# 日志指标
logback: true
# 进程指标
process: true
# 自定义指标标签
tags:
application: ${spring.application.name}
instance: ${spring.cloud.client.ip-address}:${server.port}
region: ${cloud.region:unknown}
zone: ${cloud.zone:unknown}
2.2 自定义业务指标
java
// 订单服务业务指标
@Component
@Slf4j
public class OrderMetrics {
// 计数器:订单创建数量
private final Counter orderCreatedCounter;
// 计时器:订单处理时间
private final Timer orderProcessTimer;
// 分布摘要:订单金额分布
private final DistributionSummary orderAmountSummary;
// 计量器:当前进行中的订单
private final FunctionCounter activeOrdersGauge;
private final AtomicInteger activeOrders = new AtomicInteger(0);
public OrderMetrics(MeterRegistry registry) {
// 创建计数器
orderCreatedCounter = Counter.builder("order.created")
.description("创建的订单数量")
.tag("application", "order-service")
.register(registry);
// 创建计时器
orderProcessTimer = Timer.builder("order.process.time")
.description("订单处理时间")
.publishPercentiles(0.5, 0.95, 0.99) // 50%, 95%, 99%分位数
.publishPercentileHistogram()
.register(registry);
// 创建分布摘要
orderAmountSummary = DistributionSummary.builder("order.amount")
.description("订单金额分布")
.baseUnit("CNY")
.scale(100) // 金额单位:分
.register(registry);
// 创建计量器
activeOrdersGauge = FunctionCounter.builder("order.active.count",
activeOrders, AtomicInteger::get)
.description("当前活跃订单数量")
.register(registry);
}
/**
* 记录订单创建
*/
public void recordOrderCreated(BigDecimal amount) {
orderCreatedCounter.increment();
// 记录金额
orderAmountSummary.record(amount.multiply(BigDecimal.valueOf(100)).longValue());
// 增加活跃订单
activeOrders.incrementAndGet();
log.info("订单创建指标记录完成,金额: {}", amount);
}
/**
* 记录订单处理时间
*/
public Timer.Sample startOrderProcessing() {
return Timer.start();
}
public void endOrderProcessing(Timer.Sample sample, String orderId, boolean success) {
sample.stop(orderProcessTimer
.tag("order_id", orderId)
.tag("success", String.valueOf(success)));
// 减少活跃订单
activeOrders.decrementAndGet();
log.info("订单处理完成: {},成功: {}", orderId, success);
}
/**
* 获取当前指标值
*/
public Map<String, Object> getCurrentMetrics() {
Map<String, Object> metrics = new LinkedHashMap<>();
// 订单创建计数
metrics.put("orders_created", orderCreatedCounter.count());
// 平均处理时间
metrics.put("avg_process_time_ms", orderProcessTimer.mean());
// 活跃订单数
metrics.put("active_orders", activeOrders.get());
return metrics;
}
}
在业务中使用:
java
@Service
@Slf4j
public class OrderService {
private final OrderMetrics orderMetrics;
public OrderService(OrderMetrics orderMetrics) {
this.orderMetrics = orderMetrics;
}
@Transactional
public Order createOrder(CreateOrderRequest request) {
// 开始计时
Timer.Sample timer = orderMetrics.startOrderProcessing();
try {
// 业务逻辑...
Order order = new Order();
order.setAmount(request.getAmount());
order.setStatus(OrderStatus.CREATED);
// 保存订单
orderRepository.save(order);
// 记录指标
orderMetrics.recordOrderCreated(request.getAmount());
// 结束计时(成功)
orderMetrics.endOrderProcessing(timer, order.getId(), true);
return order;
} catch (Exception e) {
// 结束计时(失败)
orderMetrics.endOrderProcessing(timer, "unknown", false);
throw e;
}
}
}
访问指标端点:
bash
# 获取所有指标
curl http://localhost:8080/actuator/metrics
# 获取特定指标
curl http://localhost:8080/actuator/metrics/order.created
curl http://localhost:8080/actuator/metrics/order.process.time
响应示例:
json
// GET /actuator/metrics/order.created
{
"name": "order.created",
"description": "创建的订单数量",
"baseUnit": null,
"measurements": [
{
"statistic": "COUNT",
"value": 1250.0
}
],
"availableTags": [
{
"tag": "application",
"values": ["order-service"]
}
]
}
// GET /actuator/metrics/order.process.time
{
"name": "order.process.time",
"description": "订单处理时间",
"baseUnit": "seconds",
"measurements": [
{
"statistic": "COUNT",
"value": 1250.0
},
{
"statistic": "TOTAL_TIME",
"value": 625.5
},
{
"statistic": "MAX",
"value": 2.1
}
],
"percentiles": {
"0.5": 0.45,
"0.95": 0.78,
"0.99": 1.2
}
}
3. 信息端点(Info)
信息端点用于暴露应用的静态信息。
3.1 基础配置
yaml
# application.yml
management:
info:
# Git信息
git:
mode: full
# 构建信息
build:
enabled: true
# 环境信息
env:
enabled: true
# Java信息
java:
enabled: true
# OS信息
os:
enabled: true
# 自定义信息
info:
app:
name: "@project.name@"
version: "@project.version@"
description: "@project.description@"
team:
name: "技术研发部"
contact: "tech@example.com"
policy:
security-level: "high"
compliance: "ISO27001"
3.2 编程式InfoContributor
java
// 自定义信息贡献者
@Component
public class CustomInfoContributor implements InfoContributor {
@Value("${spring.application.name}")
private String appName;
@Autowired
private ApplicationContext context;
@Override
public void contribute(Info.Builder builder) {
// 1. 应用运行时信息
builder.withDetail("application", Map.of(
"name", appName,
"contextPath", context.getApplicationName(),
"startupTime", getStartupTime(),
"uptime", getUptime()
));
// 2. Bean统计信息
String[] beanNames = context.getBeanDefinitionNames();
Map<String, Long> beanStats = Arrays.stream(beanNames)
.collect(Collectors.groupingBy(
name -> {
BeanDefinition bd = ((ConfigurableApplicationContext) context)
.getBeanFactory().getBeanDefinition(name);
return bd.getResourceDescription() != null ?
bd.getResourceDescription() : "unknown";
},
Collectors.counting()
));
builder.withDetail("beans", Map.of(
"total", beanNames.length,
"statistics", beanStats
));
// 3. 线程信息
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
builder.withDetail("threads", Map.of(
"total", threadBean.getThreadCount(),
"daemon", threadBean.getDaemonThreadCount(),
"peak", threadBean.getPeakThreadCount()
));
// 4. 系统信息
Runtime runtime = Runtime.getRuntime();
builder.withDetail("system", Map.of(
"processors", runtime.availableProcessors(),
"memory", Map.of(
"total", runtime.totalMemory() / 1024 / 1024 + "MB",
"free", runtime.freeMemory() / 1024 / 1024 + "MB",
"max", runtime.maxMemory() / 1024 / 1024 + "MB"
)
));
// 5. 业务信息
builder.withDetail("business", Map.of(
"features", List.of("订单管理", "支付处理", "库存管理"),
"sla", "99.9%",
"data-retention", "30天"
));
}
private String getStartupTime() {
try {
long startTime = ManagementFactory.getRuntimeMXBean().getStartTime();
return Instant.ofEpochMilli(startTime)
.atZone(ZoneId.systemDefault())
.format(DateTimeFormatter.ISO_LOCAL_DATE_TIME);
} catch (Exception e) {
return "unknown";
}
}
private String getUptime() {
try {
long uptime = ManagementFactory.getRuntimeMXBean().getUptime();
long seconds = uptime / 1000;
long days = seconds / 86400;
long hours = (seconds % 86400) / 3600;
long minutes = (seconds % 3600) / 60;
return String.format("%d天%d小时%d分钟", days, hours, minutes);
} catch (Exception e) {
return "unknown";
}
}
}
访问信息端点:
bash
curl http://localhost:8080/actuator/info
响应示例:
json
{
"application": {
"name": "order-service",
"contextPath": "",
"startupTime": "2024-05-20T10:00:00",
"uptime": "2天3小时15分钟"
},
"beans": {
"total": 156,
"statistics": {
"Spring Boot": 45,
"Spring Framework": 89,
"业务Bean": 22
}
},
"threads": {
"total": 45,
"daemon": 20,
"peak": 50
},
"system": {
"processors": 8,
"memory": {
"total": "256MB",
"free": "128MB",
"max": "512MB"
}
},
"business": {
"features": ["订单管理", "支付处理", "库存管理"],
"sla": "99.9%",
"data-retention": "30天"
}
}
四、企业级监控方案
1. 监控架构设计
Spring Boot应用
Actuator端点
健康检查 /health
指标 /metrics
信息 /info
Kubernetes探针
Prometheus
监控大屏 Grafana
配置管理
自动扩缩容
自动重启
告警规则
实时监控
AlertManager
邮件/钉钉/微信告警
2. Kubernetes集成
2.1 健康检查探针
yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
template:
spec:
containers:
- name: order-service
image: order-service:1.0.0
# 就绪探针
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
# 存活探针
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
# 启动探针
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
failureThreshold: 30
periodSeconds: 10
2.2 ServiceMonitor配置(Prometheus Operator)
yaml
# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: order-service
labels:
release: prometheus
spec:
selector:
matchLabels:
app: order-service
endpoints:
- port: http
path: /actuator/prometheus
interval: 30s
scrapeTimeout: 10s
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: pod
- sourceLabels: [__meta_kubernetes_namespace]
targetLabel: namespace
namespaceSelector:
any: true
3. Prometheus配置
yaml
# prometheus.yml
global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_configs:
- job_name: 'spring-boot-apps'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['order-service:8080', 'payment-service:8080', 'user-service:8080']
relabel_configs:
- source_labels: [__address__]
target_label: instance
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
4. Grafana仪表盘
Spring Boot应用监控仪表盘JSON配置:
json
{
"dashboard": {
"title": "Spring Boot应用监控",
"panels": [
{
"title": "应用健康状态",
"type": "stat",
"targets": [{
"expr": "spring_application_status{application=\"order-service\"}",
"legendFormat": "{{instance}}"
}]
},
{
"title": "JVM内存使用",
"type": "graph",
"targets": [{
"expr": "sum(jvm_memory_used_bytes{application=\"order-service\", area=\"heap\"}) by (instance)",
"legendFormat": "堆内存使用"
}, {
"expr": "sum(jvm_memory_max_bytes{application=\"order-service\", area=\"heap\"}) by (instance)",
"legendFormat": "堆内存上限"
}]
},
{
"title": "HTTP请求QPS",
"type": "graph",
"targets": [{
"expr": "rate(http_server_requests_seconds_count{application=\"order-service\"}[5m])",
"legendFormat": "{{method}} {{uri}} {{status}}"
}]
},
{
"title": "HTTP请求延迟",
"type": "graph",
"targets": [{
"expr": "histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket{application=\"order-service\"}[5m])) by (le, uri, method, status))",
"legendFormat": "P95 {{method}} {{uri}}"
}]
},
{
"title": "数据库连接池",
"type": "graph",
"targets": [{
"expr": "hikaricp_connections_active{application=\"order-service\"}",
"legendFormat": "活跃连接"
}, {
"expr": "hikaricp_connections_idle{application=\"order-service\"}",
"legendFormat": "空闲连接"
}]
}
]
}
}
五、安全配置
1. 端点安全保护
yaml
# application.yml
spring:
security:
user:
name: actuator
password: ${ACTUATOR_PASSWORD:ChangeMe!}
roles: ACTUATOR_ADMIN
management:
endpoints:
web:
exposure:
# 生产环境只暴露必要的端点
include: "health,info,prometheus"
exclude: "env,beans,loggers,heapdump,threaddump"
# 端点安全
endpoint:
health:
roles: "ACTUATOR_USER"
show-details: when_authorized
info:
roles: "ACTUATOR_USER"
prometheus:
roles: "ACTUATOR_USER"
# 敏感端点需要更高权限
env:
roles: "ACTUATOR_ADMIN"
beans:
roles: "ACTUATOR_ADMIN"
loggers:
roles: "ACTUATOR_ADMIN"
2. Spring Security配置
java
@Configuration
@EnableWebSecurity
public class ActuatorSecurityConfig {
@Bean
@Order(1)
public SecurityFilterChain actuatorSecurityFilterChain(HttpSecurity http)
throws Exception {
http
.securityMatcher("/actuator/**")
.authorizeHttpRequests(authz -> authz
// 健康检查公开访问(用于K8s探针)
.requestMatchers("/actuator/health/**").permitAll()
// Prometheus端点(可能需要认证)
.requestMatchers("/actuator/prometheus").hasRole("ACTUATOR_USER")
// 信息端点
.requestMatchers("/actuator/info").hasRole("ACTUATOR_USER")
// 敏感端点需要管理员权限
.requestMatchers("/actuator/env", "/actuator/beans",
"/actuator/loggers", "/actuator/heapdump",
"/actuator/threaddump").hasRole("ACTUATOR_ADMIN")
// 其他端点
.anyRequest().authenticated()
)
.httpBasic(Customizer.withDefaults())
.sessionManagement(session -> session
.sessionCreationPolicy(SessionCreationPolicy.STATELESS)
)
.csrf(AbstractHttpConfigurer::disable);
return http.build();
}
@Bean
public InMemoryUserDetailsManager userDetailsService() {
UserDetails user = User.builder()
.username("monitor")
.password("{noop}monitor123")
.roles("ACTUATOR_USER")
.build();
UserDetails admin = User.builder()
.username("admin")
.password("{noop}admin123")
.roles("ACTUATOR_ADMIN", "ACTUATOR_USER")
.build();
return new InMemoryUserDetailsManager(user, admin);
}
}
3. IP白名单限制
java
@Component
public class ActuatorIpFilter extends OncePerRequestFilter {
private final List<String> allowedIps = List.of(
"10.0.0.0/8", // 内网
"192.168.0.0/16", // 内网
"127.0.0.1", // 本地
"172.16.0.0/12" // Docker网络
);
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain)
throws ServletException, IOException {
String requestUri = request.getRequestURI();
// 只对Actuator端点进行IP过滤
if (requestUri.startsWith("/actuator") &&
!requestUri.startsWith("/actuator/health")) {
String clientIp = getClientIp(request);
if (!isIpAllowed(clientIp)) {
response.setStatus(HttpStatus.FORBIDDEN.value());
response.getWriter().write("Access denied from IP: " + clientIp);
return;
}
}
filterChain.doFilter(request, response);
}
private String getClientIp(HttpServletRequest request) {
String ip = request.getHeader("X-Forwarded-For");
if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
ip = request.getHeader("Proxy-Client-IP");
}
if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
ip = request.getHeader("WL-Proxy-Client-IP");
}
if (ip == null || ip.isEmpty() || "unknown".equalsIgnoreCase(ip)) {
ip = request.getRemoteAddr();
}
return ip;
}
private boolean isIpAllowed(String ip) {
try {
for (String allowedIp : allowedIps) {
if (allowedIp.contains("/")) {
// CIDR表示法
SubnetUtils utils = new SubnetUtils(allowedIp);
if (utils.getInfo().isInRange(ip)) {
return true;
}
} else if (allowedIp.equals(ip)) {
return true;
}
}
} catch (Exception e) {
// 解析失败,拒绝访问
}
return false;
}
}
六、企业级最佳实践
1. 多环境配置
yaml
# application-dev.yml
management:
endpoints:
web:
exposure:
include: "*" # 开发环境暴露所有端点
endpoint:
health:
show-details: always
tracing:
enabled: false # 开发环境关闭追踪
# application-prod.yml
management:
endpoints:
web:
exposure:
include: "health,info,prometheus,metrics" # 生产环境严格控制
base-path: /internal/actuator # 修改路径,增加安全性
endpoint:
health:
show-details: never # 生产环境不显示详情
shutdown:
enabled: false # 生产环境禁用关闭端点
server:
port: 9090 # 使用不同端口
tracing:
enabled: true
sampling:
probability: 0.1 # 生产环境采样率10%
2. 监控告警规则
yaml
# prometheus告警规则
groups:
- name: spring-boot-alerts
rules:
- alert: SpringBootAppDown
expr: up{job="spring-boot-apps"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Spring Boot应用下线"
description: "应用 {{ $labels.instance }} 已下线超过1分钟"
- alert: HighMemoryUsage
expr: (sum(jvm_memory_used_bytes{area="heap"}) / sum(jvm_memory_max_bytes{area="heap"})) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "JVM堆内存使用率过高"
description: "应用 {{ $labels.instance }} 堆内存使用率超过80%,当前值 {{ $value }}%"
- alert: HighGCTime
expr: rate(jvm_gc_pause_seconds_sum[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "GC暂停时间过长"
description: "应用 {{ $labels.instance }} GC暂停时间超过阈值"
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) / rate(http_server_requests_seconds_count[5m]) * 100 > 5
for: 2m
labels:
severity: critical
annotations:
summary: "HTTP错误率过高"
description: "应用 {{ $labels.instance }} 5xx错误率超过5%,当前值 {{ $value }}%"
- alert: HighLatency
expr: histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "HTTP请求延迟过高"
description: "应用 {{ $labels.instance }} P95延迟超过1秒,当前值 {{ $value }}秒"
3. 性能优化配置
yaml
management:
metrics:
# 指标采样配置
distribution:
slo:
http.server.requests: 100ms, 200ms, 500ms, 1s, 2s
percentiles-histogram:
http.server.requests: true
maximum-expected-value:
http.server.requests: 10s
# 端点缓存配置
endpoint:
health:
cache:
time-to-live: 10s
metrics:
cache:
time-to-live: 30s
prometheus:
cache:
time-to-live: 15s
# 健康检查超时配置
health:
probes:
enabled: true
liveness-state:
enabled: true
readiness-state:
enabled: true
db:
validation-query: "SELECT 1"
timeout: 5s
redis:
timeout: 3s
七、常见问题与解决方案
问题1:端点访问返回404
原因:端点未启用或未暴露
解决方案:
yaml
management:
endpoints:
web:
exposure:
include: "health,info,metrics" # 明确包含需要的端点
endpoint:
health:
enabled: true
info:
enabled: true
metrics:
enabled: true
问题2:健康检查显示DOWN状态
原因:依赖服务不可用
排查工具:
java
@Component
@Slf4j
public class HealthCheckDebugger {
@EventListener(ApplicationReadyEvent.class)
public void debugHealthStatus(ApplicationReadyEvent event) {
ApplicationContext context = event.getApplicationContext();
HealthEndpoint healthEndpoint = context.getBean(HealthEndpoint.class);
HealthComponent health = healthEndpoint.health();
if (health.getStatus() == Status.DOWN) {
log.error("应用健康状态为DOWN");
health.getComponents().forEach((name, component) -> {
if (component.getStatus() == Status.DOWN) {
log.error("故障组件: {} - 详情: {}", name, component.getDetails());
}
});
}
}
}
问题3:指标数据不准确
原因:指标配置错误或采样问题
验证工具:
java
@Component
@Slf4j
public class MetricsValidator {
@Autowired
private MeterRegistry meterRegistry;
@Scheduled(fixedDelay = 60000) // 每分钟检查一次
public void validateMetrics() {
List<Meter> meters = meterRegistry.getMeters();
log.info("当前注册的指标数量: {}", meters.size());
// 检查关键指标
meters.stream()
.filter(meter -> meter.getId().getName().startsWith("http.server.requests"))
.findFirst()
.ifPresent(meter -> {
log.info("HTTP请求指标: {}", meter.measure());
});
}
}
监控不是目的,而是手段。
通过完善的监控体系,我们能够提前发现问题、快速定位故障、持续优化性能,最终为用户提供稳定可靠的服务。