etcd常用监控

通过部署etcd-exporter+Prometheus,然后配置etcd相关告警可以及时发现etcd集群风险

常见监控项目

1. etcd集群无leader

Etcd cluster have no leader

复制代码
  - alert:EtcdNoLeader
    expr: etcd_server_has_leader ==0
for:0m
    labels:
      severity: critical
    annotations:
      summary:EtcdnoLeader(instance {{ $labels.instance }})
      description: "Etcd cluster have no leader\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"```
2. etcd grpc请求慢

GRPC requests slowing down, 99th percentile is over 0.15s

复制代码
  - alert:EtcdGrpcRequestsSlow
    expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{grpc_type="unary"}[1m]))by(grpc_service, grpc_method, le))>0.15
for:2m
    labels:
      severity: warning
    annotations:
      summary:Etcd GRPC requests slow (instance {{ $labels.instance }})
      description: "GRPC requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
3. etcd http请求慢

HTTP requests slowing down, 99th percentile is over 0.15s

复制代码
  - alert: EtcdHttpRequestsSlow
    expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd HTTP requests slow (instance {{ $labels.instance }})
      description: "HTTP requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
4. Etcd member communication slow

Etcd member communication slowing down, 99th percentile is over 0.15s

复制代码
  - alert: EtcdMemberCommunicationSlow
    expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd member communication slow (instance {{ $labels.instance }})
      description: "Etcd member communication slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
5. Etcd high fsync durations

Etcd WAL fsync duration increasing, 99th percentile is over 0.5s

复制代码
  - alert: EtcdHighFsyncDurations
    expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[1m])) > 0.5
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd high fsync durations (instance {{ $labels.instance }})
      description: "Etcd WAL fsync duration increasing, 99th percentile is over 0.5s\n  VALUE = {{ $
相关推荐
user_admin_god2 小时前
企业级管理系统的站内信怎么轻量级优雅实现
java·大数据·数据库·spring boot
百***22122 小时前
mysql的分区表
数据库·mysql
humors2213 小时前
服务端开发案例(不定期更新)
java·数据库·后端·mysql·mybatis·excel
Wang's Blog3 小时前
MySQL: 服务器性能优化全面指南:参数配置与数据库设计的最佳实践
服务器·数据库·mysql
码农101号3 小时前
Mysql主从架构的搭建
数据库·mysql·架构
cqsztech3 小时前
ORACLE数据库中如何找到过去某个时间某个表被谁修改了
数据库·oracle
好记忆不如烂笔头abc3 小时前
sql评估存储的速度和稳定性
数据库·sql
小鹏linux4 小时前
《openGauss安全架构与数据全生命周期防护实践:从技术体系到行业落地》
数据库·opengauss·gaussdb
朝新_4 小时前
【实战】动态 SQL + 统一 Result + 登录校验:图书管理系统(下)
xml·java·数据库·sql·mybatis
装不满的克莱因瓶4 小时前
什么是脏读、幻读、不可重复读?Mysql的隔离级别是什么?
数据库·mysql·事务·隔离级别·不可重复读·幻读·脏读