etcd常用监控

通过部署etcd-exporter+Prometheus,然后配置etcd相关告警可以及时发现etcd集群风险

常见监控项目

1. etcd集群无leader

Etcd cluster have no leader

复制代码
  - alert:EtcdNoLeader
    expr: etcd_server_has_leader ==0
for:0m
    labels:
      severity: critical
    annotations:
      summary:EtcdnoLeader(instance {{ $labels.instance }})
      description: "Etcd cluster have no leader\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"```
2. etcd grpc请求慢

GRPC requests slowing down, 99th percentile is over 0.15s

复制代码
  - alert:EtcdGrpcRequestsSlow
    expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{grpc_type="unary"}[1m]))by(grpc_service, grpc_method, le))>0.15
for:2m
    labels:
      severity: warning
    annotations:
      summary:Etcd GRPC requests slow (instance {{ $labels.instance }})
      description: "GRPC requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
3. etcd http请求慢

HTTP requests slowing down, 99th percentile is over 0.15s

复制代码
  - alert: EtcdHttpRequestsSlow
    expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd HTTP requests slow (instance {{ $labels.instance }})
      description: "HTTP requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
4. Etcd member communication slow

Etcd member communication slowing down, 99th percentile is over 0.15s

复制代码
  - alert: EtcdMemberCommunicationSlow
    expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd member communication slow (instance {{ $labels.instance }})
      description: "Etcd member communication slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
5. Etcd high fsync durations

Etcd WAL fsync duration increasing, 99th percentile is over 0.5s

复制代码
  - alert: EtcdHighFsyncDurations
    expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[1m])) > 0.5
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd high fsync durations (instance {{ $labels.instance }})
      description: "Etcd WAL fsync duration increasing, 99th percentile is over 0.5s\n  VALUE = {{ $
相关推荐
运维行者_14 分钟前
Applications Manager中的Redis监控
大数据·服务器·数据库·人工智能·网络协议
悦数图数据库3 小时前
图数据库选型指南 2026:从架构、性能、AI 适配三个维度看 悦数科技
数据库·人工智能·架构
handler014 小时前
【MySQL】常用命令总结(库与表增删查改)
运维·数据库·mysql·命令·总结
week@eight4 小时前
Linux - Doris
linux·运维·数据库·mysql
cdbqss15 小时前
VB2026 菜单生成基类 BqGetMenuStrip
数据库·经验分享·学习·oracle·vb
洛水水5 小时前
Redis 分布式锁详解:实现与缺陷
数据库·redis·分布式
韶博雅5 小时前
oracle中表和列转大写
数据库·oracle
暴躁小师兄数据学院6 小时前
【AI大数据工程师特训笔记】第04讲:PostgreSQL 数据库内置函数详解
大数据·数据库·笔记·ai·语言模型
苏渡苇6 小时前
Spring Cloud Alibaba:将 Sentinel 熔断限流规则持久化到 Nacos 配置中心
数据库·spring boot·mysql·spring cloud·nacos·sentinel·持久化