etcd常用监控

通过部署etcd-exporter+Prometheus,然后配置etcd相关告警可以及时发现etcd集群风险

常见监控项目

1. etcd集群无leader

Etcd cluster have no leader

复制代码
  - alert:EtcdNoLeader
    expr: etcd_server_has_leader ==0
for:0m
    labels:
      severity: critical
    annotations:
      summary:EtcdnoLeader(instance {{ $labels.instance }})
      description: "Etcd cluster have no leader\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"```
2. etcd grpc请求慢

GRPC requests slowing down, 99th percentile is over 0.15s

复制代码
  - alert:EtcdGrpcRequestsSlow
    expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{grpc_type="unary"}[1m]))by(grpc_service, grpc_method, le))>0.15
for:2m
    labels:
      severity: warning
    annotations:
      summary:Etcd GRPC requests slow (instance {{ $labels.instance }})
      description: "GRPC requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
3. etcd http请求慢

HTTP requests slowing down, 99th percentile is over 0.15s

复制代码
  - alert: EtcdHttpRequestsSlow
    expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd HTTP requests slow (instance {{ $labels.instance }})
      description: "HTTP requests slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
4. Etcd member communication slow

Etcd member communication slowing down, 99th percentile is over 0.15s

复制代码
  - alert: EtcdMemberCommunicationSlow
    expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[1m])) > 0.15
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd member communication slow (instance {{ $labels.instance }})
      description: "Etcd member communication slowing down, 99th percentile is over 0.15s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
5. Etcd high fsync durations

Etcd WAL fsync duration increasing, 99th percentile is over 0.5s

复制代码
  - alert: EtcdHighFsyncDurations
    expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[1m])) > 0.5
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Etcd high fsync durations (instance {{ $labels.instance }})
      description: "Etcd WAL fsync duration increasing, 99th percentile is over 0.5s\n  VALUE = {{ $
相关推荐
GDAL1 小时前
Node.js v22.5+ 官方 SQLite 模块全解析:从入门到实战
数据库·sqlite·node.js
DCTANT2 小时前
【原创】国产化适配-全量迁移MySQL数据到OpenGauss数据库
java·数据库·spring boot·mysql·opengauss
AI、少年郎4 小时前
Oracle 进阶语法实战:从多维分析到数据清洗的深度应用(第四课)
数据库·oracle
赤橙红的黄4 小时前
自定义线程池-实现任务0丢失的处理策略
数据库·spring
DataGear5 小时前
如何在DataGear 5.4.1 中快速制作SQL服务端分页的数据表格看板
javascript·数据库·sql·信息可视化·数据分析·echarts·数据可视化
weixin_438335405 小时前
分布式锁实现方式:基于Redis的分布式锁实现(Spring Boot + Redis)
数据库·redis·分布式
码不停蹄的玄黓5 小时前
MySQL Undo Log 深度解析:事务回滚与MVCC的核心功臣
数据库·mysql·undo log·回滚日志
Qdgr_5 小时前
价值实证:数字化转型标杆案例深度解析
大数据·数据库·人工智能
数据狐(DataFox)5 小时前
SQL参数化查询:防注入与计划缓存的双重优势
数据库·sql·缓存
Arthurmoo6 小时前
Linux系统之MySQL数据库基础
linux·数据库·mysql