etcd常用监控

通过部署etcd-exporter+Prometheus,然后配置etcd相关告警可以及时发现etcd集群风险

常见监控项目

1. etcd集群无leader

Etcd cluster have no leader

复制代码
  - alert:EtcdNoLeader
    expr: etcd_server_has_leader ==0
for:0m
    labels:
      severity: critical
    annotations:
      summary:EtcdnoLeader(instance {{ $labels.instance }})
      description: "Etcd cluster have no leader\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"```
2. etcd grpc请求慢

GRPC requests slowing down, 99th percentile is over 0.15s

复制代码
  - alert:EtcdGrpcRequestsSlow
    expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{grpc_type="unary"}[1m]))by(grpc_service, grpc_method, le))>0.15
for:2m
    labels:
      severity: warning
    annotations:
      summary:Etcd GRPC requests slow (instance {{ $labels.instance }})
      description: "GRPC requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
3. etcd http请求慢

HTTP requests slowing down, 99th percentile is over 0.15s

复制代码
  - alert: EtcdHttpRequestsSlow
    expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd HTTP requests slow (instance {{ $labels.instance }})
      description: "HTTP requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
4. Etcd member communication slow

Etcd member communication slowing down, 99th percentile is over 0.15s

复制代码
  - alert: EtcdMemberCommunicationSlow
    expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd member communication slow (instance {{ $labels.instance }})
      description: "Etcd member communication slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
5. Etcd high fsync durations

Etcd WAL fsync duration increasing, 99th percentile is over 0.5s

复制代码
  - alert: EtcdHighFsyncDurations
    expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[1m])) > 0.5
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd high fsync durations (instance {{ $labels.instance }})
      description: "Etcd WAL fsync duration increasing, 99th percentile is over 0.5s\n  VALUE = {{ $
相关推荐
摸鱼仙人~1 小时前
Redis 数据结构全景解析
数据结构·数据库·redis
t198751283 小时前
解决MySQL删除/var/lib/mysql下的所有文件后无法启动的问题
数据库·mysql·adb
大佐不会说日语~6 小时前
Redis高频问题全解析
java·数据库·redis
会飞的灰大狼6 小时前
初识数据库
数据库
旋风菠萝8 小时前
JVM易混淆名称
java·jvm·数据库·spring boot·redis·面试
AWS官方合作商9 小时前
Amazon RDS for MySQL成本优化:RDS缓存降本实战
数据库·mysql·aws
77qqqiqi9 小时前
解决Property ‘sqlSessionFactory‘ or ‘sqlSessionTemplate‘ are required报错问题
java·数据库·微服务·mybatis·mybatisplus
眺望电子-ARM嵌入式10 小时前
技术笔记 | Ubuntu 系统 OTA 升级全流程详解
数据库·postgresql·php
程序猿小D11 小时前
Java项目:基于SSM框架实现的校园活动资讯网管理系统【ssm+B/S架构+源码+数据库+毕业论文+远程部署】
java·数据库·mysql·spring·毕业设计·ssm框架·校园活动