普罗米修斯监控系统学习实践报告
1.学习目标达成情况
| 监控对象 | Exporter | 端口 | 状态 |
|----------------|------------------------------|--------|------|
| 主机系统 | node_exporter 1.3.1 | 9100 | UP |
| Prometheus 自身| 自带 | 9090 | UP |
| MySQL | mysqld_exporter 0.14.0 | 9104 | UP |
| Nginx | nginx-vts-exporter 0.10.3 | 9913 | UP |
| Alertmanager | alertmanager 0.27.0 | 9093 | UP |
| 告警通道 | 钉钉自定义机器人 Webhook | - | 正在做 |
2.完整的部署过程
1.Prometheus 主程序部署(核心)
cd /opt
tar xf prometheus-2.35.0.linux-amd64.tar.gz
mv prometheus-2.35.0.linux-amd64 /usr/local/prometheus
#配置文件(总的)
cat > /usr/local/prometheus/prometheus.yml <<'EOF'
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ["localhost:9090"]
labels:
instance: "prometheus-self"
- job_name: node
static_configs:
- targets: ["localhost:9100"]
labels:
instance: "192.168.164.131"
env: "practice"
- job_name: mysql
static_configs:
- targets: ["localhost:9104"]
- job_name: nginx
static_configs:
- targets: ["localhost:9913"]
EOF
# systemd 服务(关键参数全加上)
cat > /etc/systemd/system/prometheus.service <<'EOF'
[Unit]
Description=Prometheus
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus.yml \
--storage.tsdb.path=/data/prometheus \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now prometheus
2.Exporter 快速部署(全本地包)
# Node Exporter
tar xf node_exporter-1.3.1.linux-amd64.tar.gz
cp node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/
systemctl enable --now $(cat <<EOF
[Unit]
Description=Node Exporter
[Service]
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOF
echo /etc/systemd/system/node_exporter.service)
# MySQL + mysqld_exporter
yum install -y mariadb-server
systemctl enable --now mariadb
mysql -e "ALTER USER 'root'@'localhost' IDENTIFIED BY '123456';"
mysql -uroot -p123456 -e "CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'exporter123'; GRANT PROCESS,REPLICATION CLIENT,SELECT ON *.* TO 'exporter'@'localhost';"
tar xf mysqld_exporter-0.14.0.linux-amd64.tar.gz
cp mysqld_exporter-0.14.0.linux-amd64/mysqld_exporter /usr/local/bin/
cat > /etc/systemd/system/mysqld_exporter.service <<'EOF'
[Unit]
Description=MySQL Exporter
[Service]
Environment="DATA_SOURCE_NAME=exporter:exporter123@(127.0.0.1:3306)/"
ExecStart=/usr/local/bin/mysqld_exporter
[Install]
WantedBy=multi-user.target
EOF
systemctl enable --now mysqld_exporter
3.Alertmanager + 钉钉告警(30秒必达)
Prometheus → Alertmanager → webhook-dingtalk → 钉钉机器人 → 30秒送达
group_wait: 1s # 接收到第一个告警,1 秒就发,不等
group_interval: 10s # 同组告警每10秒可再次发送
repeat_interval: 30s # 告警未解决,30秒重发
https://oapi.dingtalk.com/robot/send?access_token=XXXX
创建 /opt/webhook-dingtalk/config.yml:
targets:
dingding:
url: "https://oapi.dingtalk.com/robot/send?access_token=你的token"
secret: "你的加签secret" # 如果开启加签,否则删除此行
四、Systemd 服务(自动启动)
创建 /etc/systemd/system/webhook-dingtalk.service:
[Unit]
Description=Prometheus Webhook Dingtalk
After=network.target
[Service]
ExecStart=/opt/webhook-dingtalk/prometheus-webhook-dingtalk \
--config.file=/opt/webhook-dingtalk/config.yml \
--web.listen-address="0.0.0.0:8060"
Restart=always
[Install]
WantedBy=multi-user.target
启动:
systemctl daemon-reload
systemctl enable --now webhook-dingtalk
检查:
ss -lntp | grep 8060
看到 8060 LISTEN 表示正常。
五、Alertmanager 配置(30 秒必达关键点!)
编辑 /etc/alertmanager/alertmanager.yml 或你的安装路径:
global:
resolve_timeout: 5s
route:
receiver: "dingding"
group_by: ['alertname']
group_wait: 1s # 1秒就发
group_interval: 10s # 同组告警间隔10秒
repeat_interval: 30s # 未解决则30秒重复
receivers:
- name: "dingding"
webhook_configs:
- url: "http://127.0.0.1:8060/dingtalk/dingding/send"
send_resolved: true
⚠ 如果你的 webhook-dingtalk 不在本机,把 127.0.0.1 改成实际 IP
重启 Alertmanager:
systemctl restart alertmanager
六、Prometheus 告警规则(测试 30 秒必达)
编辑 /etc/prometheus/rules/test.yml:
groups:
- name: test-alert
rules:
- alert: InstanceDown
expr: up == 0
for: 2s
labels:
severity: critical
annotations:
summary: "【实例宕机】 {{ $labels.instance }}"
description: "{{ $labels.instance }} 无响应 (状态=0)"
加载规则:
systemctl restart prometheus
七、测试告警是否 30 秒以内到达
你可以停止某个 Exporter,例如:
systemctl stop node_exporter
3.注意事项
热重载必须加 --web.enable-lifecycle + ExecReload
外部访问必须 --web.listen-address=0.0.0.0:9090
所有 Exporter 必须先 UP 才能被 Prometheus 抓到
systemctl reload prometheus 是生产标配
4.下一步计划
1.钉钉报警优化
2.学习Alertmanager 抑制/静默
3.Grafana 深度使用,自定义 2 个业务面板

sten-address=0.0.0.0:9090
所有 Exporter 必须先 UP 才能被 Prometheus 抓到
systemctl reload prometheus 是生产标配
4.下一步计划
1.钉钉报警优化
2.学习Alertmanager 抑制/静默
3.Grafana 深度使用,自定义 2 个业务面板
