etcd常用监控

通过部署etcd-exporter+Prometheus,然后配置etcd相关告警可以及时发现etcd集群风险

常见监控项目

1. etcd集群无leader

Etcd cluster have no leader

  - alert:EtcdNoLeader
    expr: etcd_server_has_leader ==0
for:0m
    labels:
      severity: critical
    annotations:
      summary:EtcdnoLeader(instance {{ $labels.instance }})
      description: "Etcd cluster have no leader\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"```
2. etcd grpc请求慢

GRPC requests slowing down, 99th percentile is over 0.15s

  - alert:EtcdGrpcRequestsSlow
    expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{grpc_type="unary"}[1m]))by(grpc_service, grpc_method, le))>0.15
for:2m
    labels:
      severity: warning
    annotations:
      summary:Etcd GRPC requests slow (instance {{ $labels.instance }})
      description: "GRPC requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
3. etcd http请求慢

HTTP requests slowing down, 99th percentile is over 0.15s

  - alert: EtcdHttpRequestsSlow
    expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd HTTP requests slow (instance {{ $labels.instance }})
      description: "HTTP requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
4. Etcd member communication slow

Etcd member communication slowing down, 99th percentile is over 0.15s

  - alert: EtcdMemberCommunicationSlow
    expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd member communication slow (instance {{ $labels.instance }})
      description: "Etcd member communication slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
5. Etcd high fsync durations

Etcd WAL fsync duration increasing, 99th percentile is over 0.5s

  - alert: EtcdHighFsyncDurations
    expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[1m])) > 0.5
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd high fsync durations (instance {{ $labels.instance }})
      description: "Etcd WAL fsync duration increasing, 99th percentile is over 0.5s\n  VALUE = {{ $
相关推荐
qq_5298353520 分钟前
Redis作为缓存和数据库的数据一致性问题
数据库·redis·缓存
山猪打不过家猪5 小时前
ASP.NET Core Clean Architecture
java·数据库·asp.net
qwy7152292581636 小时前
13-R数据重塑
服务器·数据库·r语言
Bio Coder6 小时前
R语言安装生物信息数据库包
开发语言·数据库·r语言
钊兵7 小时前
数据库驱动免费下载(Oracle、Mysql、达梦、Postgresql)
数据库·mysql·postgresql·oracle·达梦·驱动
weixin_425878238 小时前
Redis复制性能优化利器:深入解析replica-lazy-flush参数
数据库·redis·性能优化
左灯右行的爱情8 小时前
Redis数据结构总结-listPack
数据结构·数据库·redis
隔壁老王1569 小时前
mysql实时同步到es
数据库·mysql·elasticsearch
想要打 Acm 的小周同学呀9 小时前
Redis三剑客解决方案
数据库·redis·缓存