Prometheus 6-Rocky Linux 9用Prometheus+邮件告警(Docker 监控)

一、前置条件(已完成)

✅ Rocky Linux 9

✅ Docker 已安装并运行

✅ Prometheus 3.12.0(已安装)

✅ Alertmanager 0.33.0(已安装 + 163 邮箱可用)

✅ Node Exporter(主机监控)

✅ MySQL Exporter(已装)

二、使用官方 Quick Start 部署 cAdvisor

1、启动 cAdvisor

bash 复制代码
VERSION=0.55.1 && \
sudo docker rm -f cadvisor 2>/dev/null && \
sudo docker pull ghcr.io/google/cadvisor:$VERSION && \
sudo docker run -d \
  --name=cadvisor \
  --restart=always \
  --privileged \
  --device=/dev/kmsg \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker:/var/lib/docker:ro \
  -v /dev/disk:/dev/disk:ro \
  -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
  -p 8080:8080 \
  ghcr.io/google/cadvisor:$VERSION

2、验证 cAdvisor

bash 复制代码
sudo docker ps

curl http://localhost:8080/metrics | head

三、Prometheus 接入 Docker 监控

1、编辑 Prometheus 配置

bash 复制代码
sudo vim /etc/prometheus/prometheus.yml
追加
# Docker 容器监控(cAdvisor 官方 Quick Start)
  - job_name: "docker"
    static_configs:
      - targets: ["localhost:8080"]

✅ 完整 scrape_configs示例

bash 复制代码
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
       # The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.
        labels:
          app: "prometheus"
  # 系统监控(Node Exporter)
  - job_name: "node_exporter"
    static_configs:
      - targets: ["localhost:9100"]
  # MySQL 监控(mysqld_exporter)
  - job_name: "mysql"
    static_configs:
      - targets: ["localhost:9104"]
  # Alertmanager 监控
  - job_name: "alertmanager"
    static_configs:
      - targets: ["localhost:9093"]
        labels:
          app: "alertmanager"
  # ↓↓ Docker 监控 ↓↓
  - job_name: "docker"
    static_configs:
      - targets: ["localhost:8080"]

四、Docker 告警规则(生产级)

bash 复制代码
# ======================
  # Docker 容器监控告警
  # ======================
  - name: docker
    rules:
      - alert: DockerDown
        expr: up{job="docker"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Docker 监控服务(cAdvisor)不可用"
          description: "cAdvisor {{ $labels.instance }} 已失联超过 1 分钟"

      - alert: ContainerHighCPU
        expr: sum(rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[5m])) by (container) > 0.8
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "容器 CPU 使用率过高"
          description: "容器 {{ $labels.container }} CPU 使用率超过 80%"

      - alert: ContainerHighMemory
        expr: sum(container_memory_usage_bytes{container!="POD",container!=""}) by (container)
             / sum(container_spec_memory_limit_bytes{container!="POD",container!=""}) by (container) > 0.8
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "容器内存使用率过高"
          description: "容器 {{ $labels.container }} 内存使用率超过 80%"

五、检查并重启

bash 复制代码
promtool check config /etc/prometheus/prometheus.yml
sudo systemctl restart prometheus

六、验证 Docker 监控

1、浏览器访问:

http://服务器IP:9090 → Status → Targets

2、触发 Docker 告警测试

bash 复制代码
sudo docker stop cadvisor

1~2 分钟

Prometheus → Alerts → DockerDown = Firing

163 邮箱收到 Docker 告警邮件

恢复:

bash 复制代码
sudo docker start cadvisor

✅ 收到 Resolved ​ 邮件

在这里插入图片描述