etcd常用监控

通过部署etcd-exporter+Prometheus,然后配置etcd相关告警可以及时发现etcd集群风险

常见监控项目

1. etcd集群无leader

Etcd cluster have no leader

  - alert:EtcdNoLeader
    expr: etcd_server_has_leader ==0
for:0m
    labels:
      severity: critical
    annotations:
      summary:EtcdnoLeader(instance {{ $labels.instance }})
      description: "Etcd cluster have no leader\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"```
2. etcd grpc请求慢

GRPC requests slowing down, 99th percentile is over 0.15s

  - alert:EtcdGrpcRequestsSlow
    expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{grpc_type="unary"}[1m]))by(grpc_service, grpc_method, le))>0.15
for:2m
    labels:
      severity: warning
    annotations:
      summary:Etcd GRPC requests slow (instance {{ $labels.instance }})
      description: "GRPC requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
3. etcd http请求慢

HTTP requests slowing down, 99th percentile is over 0.15s

  - alert: EtcdHttpRequestsSlow
    expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd HTTP requests slow (instance {{ $labels.instance }})
      description: "HTTP requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
4. Etcd member communication slow

Etcd member communication slowing down, 99th percentile is over 0.15s

  - alert: EtcdMemberCommunicationSlow
    expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd member communication slow (instance {{ $labels.instance }})
      description: "Etcd member communication slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
5. Etcd high fsync durations

Etcd WAL fsync duration increasing, 99th percentile is over 0.5s

  - alert: EtcdHighFsyncDurations
    expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[1m])) > 0.5
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd high fsync durations (instance {{ $labels.instance }})
      description: "Etcd WAL fsync duration increasing, 99th percentile is over 0.5s\n  VALUE = {{ $
相关推荐
chengxuyuan1213_9 分钟前
高级SQL技巧
java·数据库·sql
weixin_SAG13 分钟前
21天掌握javaweb-->第15天:综合项目实战启动与需求分析
数据库
nbplus_0071 小时前
Redis Set操作
数据库·redis·bootstrap
逸曦玥泱1 小时前
搭建分布式HBase集群
数据库·分布式·hbase
逆天小北鼻2 小时前
配置一个外网yum源及常用命令
服务器·数据库·windows
weisian1512 小时前
Redis篇-11--数据结构篇3--字符串内存模型(简单动态字符串SDS)
数据结构·数据库·redis
.生产的驴2 小时前
Dcoker Redis哨兵模式集群介绍与搭建 故障转移 分布式 Java客户端连接
java·大数据·数据库·redis·分布式·mysql·缓存
小小小妮子~2 小时前
MySQL专题:SQL优化实践
数据库·oracle
开敲2 小时前
【MySQL】MySQL表的操作
数据库·mysql
Psycho_MrZhang2 小时前
MySQL JOIN算法实现和选择
数据库·mysql·算法