k8s prometheus监控平台-alertmanager告警

部署alertmanager

如果时间不同步

  1. 启动 chronyd 服务 systemctl start chronyd

  2. 设置开机自启 systemctl enable chronyd

  3. 现在再执行时间同步 chronyc makestep

  4. 验证同步结果 chronyc tracking

启动服务:nohup ./prometheus --config.file=prometheus.yml &

启动grafana:systemctl start grafana-server

拉取并解压alertmanager:tar xf alertmanager-0.28.1.linux-amd64.tar.gz -C /home/

cd /home/alertmanager-0.28.1.linux-amd64/

vim alertmanager.yml

global:

smtp_smarthost: 'smtp.163.com:25'

smtp_from: '17645171729@163.com'

smtp_auth_username: '17645171729@163.com'

smtp_auth_password: 'WDkNLdv5y8kNbHXm'

smtp_require_tls: false

route:

group_by: 'alertname'

group_wait: 30s

group_interval: 5m

repeat_interval: 1h

receiver: 'default'

receivers:

  • name: 'default'

email_configs:

  • to: '1258185468@qq.com'

send_resolved: true

inhibit_rules:

  • source_match:

severity: 'critical'

target_match:

severity: 'warning'

equal: 'alertname', 'dev', 'instance'

启动服务:nohup ./alertmanager --web.listen-address=:9093 --config.file=/home/alertmanager-0.28.1.linux-amd64/alertmanager.yml &

在网页查看192.168.11.160:9093

编译文件:vim prometheus-3.3.0.linux-amd64/prometheus.yml ()

创建告警规则文件目录:mkdir -p /etc/prometheus/rules

编写告警规则:(想监测什么豆包都能生成)(这是时间脚本,生产环境CPU 使用率超过 80

重启prometheus服务: fg 1 , nohup ./prometheus --config.file=prometheus.yml &

告警成功案例:

设置钉钉告警

下载(选最新版,当前 v2.0.0)

wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.0.0/prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz

解压

tar xf prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz -C /home

ln -s /home/prometheus-webhook-dingtalk-2.0.0.linux-amd64 /home/dingtalk-webhook

准备配置

mkdir -p /home/dingtalk-webhook/templates

cd /home/dingtalk-webhook

cp config.example.yml config.yml

配置 config.yml(填入钉钉 Webhook + secret)及编写模版文件

vim /home/dingtalk-webhook/config.yml

Request timeout

timeout: 5s

Uncomment following line in order to write template from scratch (be careful!)

#no_builtin_template: true

Customizable templates path

templates:

  • /home/dingtalk.tmpl

You can also override default template using `default_message`

The following example to use the 'legacy' template from v0.3.0

#default_message:

title: '{{ template "legacy.title" . }}'

text: '{{ template "legacy.content" . }}'

Targets, previously was known as "profiles"

targets:

default:

url: https://oapi.dingtalk.com/robot/send?access_token=9b67740a6b95338d6c83b4464eafc0815cf2eac766edb7698129d5854eaae870

secret: SECee4de1b83b947f6fc402244b3d5308d84eff7ba81b4c6a275cebea18abe4fa3b

编写文件:vim dingtalk.tmpl

{{ define "__subject" }}

{{ .Status \| toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing \| len }}{{ end }}

{{ end }}

{{ define "__alert_list" }}

{{ range . }}


**告警主题**: {{ .Annotations.summary }}

**告警类型**: {{ .Labels.alertname }}

**告警级别**: {{ .Labels.severity }}

**告警主机**: {{ .Labels.instance }}

**告警信息**: {{ .Annotations.description }}

**触发时间**: {{ .StartsAt.Format "2006-01-02 15:04:05" }}

{{ end }}

{{ end }}

{{ define "__resolved_list" }}

{{ range . }}


**恢复主题**: {{ .Annotations.summary }}

**告警类型**: {{ .Labels.alertname }}

**告警级别**: {{ .Labels.severity }}

**告警主机**: {{ .Labels.instance }}

**告警信息**: {{ .Annotations.description }}

**触发时间**: {{ .StartsAt.Format "2006-01-02 15:04:05" }}

**恢复时间**: {{ .EndsAt.Format "2006-01-02 15:04:05" }}

{{ end }}

{{ end }}

{{ define "dingtalk.title" }}

{{ if eq .Status "firing" }}🔥【Prometheus告警】{{ else }}✅【Prometheus恢复】{{ end }}

{{ end }}

{{ define "dingtalk.content" }}

{{ if eq .Status "firing" }}🔥【Prometheus告警通知】{{ else }}✅【Prometheus恢复通知】{{ end }}

{{ if gt (len .Alerts.Firing) 0 }}

**==== 侦测到 {{ .Alerts.Firing | len }} 个故障 ====**

{{ template "__alert_list" .Alerts.Firing }}

{{ end }}

{{ if gt (len .Alerts.Resolved) 0 }}

**==== 已恢复 {{ .Alerts.Resolved | len }} 个故障 ====**

{{ template "__resolved_list" .Alerts.Resolved }}

{{ end }}

{{ end }}

{{ define "legacy.title" }}{{ template "dingtalk.title" . }}{{ end }}

{{ define "legacy.content" }}{{ template "dingtalk.content" . }}{{ end }}

Systemd 托管(开机自启)

vim /etc/systemd/system/dingtalk-webhook.service

Unit

Description=Prometheus DingTalk Webhook

After=network.target

Service

ExecStart=/home/dingtalk-webhook/prometheus-webhook-dingtalk --config.file=/home/dingtalk-webhook/config.yml

Restart=always

RestartSec=5

Install

WantedBy=multi-user.target

systemctl daemon-reload

systemctl start dingtalk-webhook

systemctl enable dingtalk-webhook

验证端口 8060 监听

ss -tuln | grep 8060

配置 Alertmanager 0.28.1(alertmanager.yml)

vim /home/alertmanager-0.28.1.linux-amd64/alertmanager.yml

global:
resolve_timeout: 5m

route:
group_by: ['alertname', 'instance', 'severity']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h # 避免刷屏
receiver: 'dingtalk'

receivers:
`

inhibit_rules:
`

  • source_match:
    severity: 'critical'
    target_match:
    severity: 'warning'
    equal: ['alertname', 'instance']`
重启 Alertmanager

fg 2

nohup ./alertmanager --web.listen-address=:9093 --config.file=/home/alertmanager-0.28.1.linux-amd64/alertmanager.yml &

验证告警链路(测试)

curl -X POST http://localhost:9093/api/v2/alerts \

-H "Content-Type: application/json" \

-d '[{

"labels": {

"alertname": "TestDingTalkAlert",

"severity": "warning",

"instance": "test-server-01"

},

"annotations": {

"summary": "测试钉钉告警",

"description": "验证 Alertmanager → 钉钉链路"

},

"startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"

}]'

相关推荐
云飞云共享云桌面11 小时前
传统工作站 vs 云飞云共享云桌面:制造业设计云桌面选型深度对比
运维·服务器·前端·网络·3d·架构·制造
Hadoop_Liang11 小时前
使用Kubernetes Gateway API实现域名访问应用
容器·kubernetes·gateway
Maynor99614 小时前
我用 Codex 给自己的网站上线了一个智能体客服:从 Dify 到服务器部署,全程实战复盘
运维·服务器
java_cj14 小时前
深入kubectl create源码:从YAML到Pod的完整链路拆解
运维·云原生·容器·kubernetes
深圳恒讯15 小时前
越南服务器BGP多线和单线有什么区别?
运维·服务器
志栋智能15 小时前
超自动化运维如何提升安全合规水平?
运维·安全·自动化
A_humble_scholar16 小时前
Linux(九) 进程管理完全指南:从入门到实战
linux·运维·chrome
江华森16 小时前
Linux 操作命令完全指南
linux·运维
源图客17 小时前
【AI向量数据库】Weaviate介绍与部署
运维·docker·容器
用什么都重名17 小时前
Git分支合并与远程服务器同步实战:保留关键配置文件
运维·服务器·git