部署alertmanager
如果时间不同步
-
启动 chronyd 服务 systemctl start chronyd
-
设置开机自启 systemctl enable chronyd
-
现在再执行时间同步 chronyc makestep
-
验证同步结果 chronyc tracking
启动服务:nohup ./prometheus --config.file=prometheus.yml &
启动grafana:systemctl start grafana-server
拉取并解压alertmanager:tar xf alertmanager-0.28.1.linux-amd64.tar.gz -C /home/
cd /home/alertmanager-0.28.1.linux-amd64/
vim alertmanager.yml
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: '17645171729@163.com'
smtp_auth_username: '17645171729@163.com'
smtp_auth_password: 'WDkNLdv5y8kNbHXm'
smtp_require_tls: false
route:
group_by: 'alertname'
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'default'
receivers:
- name: 'default'
email_configs:
- to: '1258185468@qq.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: 'alertname', 'dev', 'instance'

启动服务:nohup ./alertmanager --web.listen-address=:9093 --config.file=/home/alertmanager-0.28.1.linux-amd64/alertmanager.yml &
在网页查看192.168.11.160:9093

编译文件:vim prometheus-3.3.0.linux-amd64/prometheus.yml ()

创建告警规则文件目录:mkdir -p /etc/prometheus/rules
编写告警规则:(想监测什么豆包都能生成)(这是时间脚本,生产环境CPU 使用率超过 80 )

重启prometheus服务: fg 1 , nohup ./prometheus --config.file=prometheus.yml &
告警成功案例:

设置钉钉告警
下载(选最新版,当前 v2.0.0)
解压
tar xf prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz -C /home
ln -s /home/prometheus-webhook-dingtalk-2.0.0.linux-amd64 /home/dingtalk-webhook
准备配置
mkdir -p /home/dingtalk-webhook/templates
cd /home/dingtalk-webhook
cp config.example.yml config.yml
配置 config.yml(填入钉钉 Webhook + secret)及编写模版文件
vim /home/dingtalk-webhook/config.yml
Request timeout
timeout: 5s
Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true
Customizable templates path
templates:
- /home/dingtalk.tmpl
You can also override default template using `default_message`
The following example to use the 'legacy' template from v0.3.0
#default_message:
title: '{{ template "legacy.title" . }}'
text: '{{ template "legacy.content" . }}'
Targets, previously was known as "profiles"
targets:
default:
secret: SECee4de1b83b947f6fc402244b3d5308d84eff7ba81b4c6a275cebea18abe4fa3b
编写文件:vim dingtalk.tmpl
{{ define "__subject" }}
{{ .Status \| toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing \| len }}{{ end }}
{{ end }}
{{ define "__alert_list" }}
{{ range . }}
**告警主题**: {{ .Annotations.summary }}
**告警类型**: {{ .Labels.alertname }}
**告警级别**: {{ .Labels.severity }}
**告警主机**: {{ .Labels.instance }}
**告警信息**: {{ .Annotations.description }}
**触发时间**: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
{{ end }}
{{ end }}
{{ define "__resolved_list" }}
{{ range . }}
**恢复主题**: {{ .Annotations.summary }}
**告警类型**: {{ .Labels.alertname }}
**告警级别**: {{ .Labels.severity }}
**告警主机**: {{ .Labels.instance }}
**告警信息**: {{ .Annotations.description }}
**触发时间**: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
**恢复时间**: {{ .EndsAt.Format "2006-01-02 15:04:05" }}
{{ end }}
{{ end }}
{{ define "dingtalk.title" }}
{{ if eq .Status "firing" }}🔥【Prometheus告警】{{ else }}✅【Prometheus恢复】{{ end }}
{{ end }}
{{ define "dingtalk.content" }}
{{ if eq .Status "firing" }}🔥【Prometheus告警通知】{{ else }}✅【Prometheus恢复通知】{{ end }}
{{ if gt (len .Alerts.Firing) 0 }}
**==== 侦测到 {{ .Alerts.Firing | len }} 个故障 ====**
{{ template "__alert_list" .Alerts.Firing }}
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
**==== 已恢复 {{ .Alerts.Resolved | len }} 个故障 ====**
{{ template "__resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}
{{ define "legacy.title" }}{{ template "dingtalk.title" . }}{{ end }}
{{ define "legacy.content" }}{{ template "dingtalk.content" . }}{{ end }}
Systemd 托管(开机自启)
vim /etc/systemd/system/dingtalk-webhook.service
Unit
Description=Prometheus DingTalk Webhook
After=network.target
Service
ExecStart=/home/dingtalk-webhook/prometheus-webhook-dingtalk --config.file=/home/dingtalk-webhook/config.yml
Restart=always
RestartSec=5
Install
WantedBy=multi-user.target
systemctl daemon-reload
systemctl start dingtalk-webhook
systemctl enable dingtalk-webhook
验证端口 8060 监听
ss -tuln | grep 8060
配置 Alertmanager 0.28.1(alertmanager.yml)
vim /home/alertmanager-0.28.1.linux-amd64/alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'instance', 'severity']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h # 避免刷屏
receiver: 'dingtalk'
receivers:
`
- name: 'dingtalk'
webhook_configs: - url: 'http://localhost:8060/dingtalk/default/send' # 转发服务地址
send_resolved: true # 告警恢复也发通知`
inhibit_rules:
`
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']`
重启 Alertmanager
fg 2
nohup ./alertmanager --web.listen-address=:9093 --config.file=/home/alertmanager-0.28.1.linux-amd64/alertmanager.yml &
验证告警链路(测试)
curl -X POST http://localhost:9093/api/v2/alerts \
-H "Content-Type: application/json" \
-d '[{
"labels": {
"alertname": "TestDingTalkAlert",
"severity": "warning",
"instance": "test-server-01"
},
"annotations": {
"summary": "测试钉钉告警",
"description": "验证 Alertmanager → 钉钉链路"
},
"startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
}]'
