blackbox exporter 是prometheus社区提供的黑盒监控解决方案,运行用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测(主动监测主机与服务状态)。
- HTTP 测试
定义 Request Header 信息
判断 Http status / Http Respones Header / Http Body 内容 - TCP 测试
业务组件端口状态监听
应用层协议定义与监听 - ICMP 测试
主机探活机制 - POST 测试
接口联通性 - SSL 证书过期时间
安装Blackbox exporter
docker run -d -p 9115:9115 --name blackbox_exporter -v /root/prometheus/blackbox_exporter:/config prom/blackbox-exporter:master --config.file=/config/blackbox.yml
写入配置
cat >/root/prometheus/blackbox_exporter/blackbox.yml<<EOF
modules:
http_2xx: # http 检测模块 Blockbox-Exporter 中所有的探针均是以 Module 的信息进行配置
prober: http
timeout: 30s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
valid_status_codes: [200] # 这里最好作一个返回状态码,在grafana作图时,有明示---陈刚注释。
method: GET
preferred_ip_protocol: "ip4"
http_post_2xx: # http post 监测模块
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
method: POST
preferred_ip_protocol: "ip4"
tcp_connect: # TCP 检测模块
prober: tcp
timeout: 10s
EOF
重启blackbox-exporter
编辑Promethues配置文件
- job_name: 'blackbox_http_2xx'
metrics_path: /probe
params:
module: [http_2xx] #配置get请求检测
static_configs:
- targets:
- https://xxx.cn
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter:9115 #blackbox地址和端口号
- job_name: 'blackbox_tcp_connect' # 检测某些端口是否在线
scrape_interval: 30s
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- xxx.cn:4433
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter:9115 # blackbox-exporter 服务所在的机器和端口
Grafana 配置
Grafana模板推荐
16292
AlertManager
alertmanager告警配置如下
-
SSL证书小于30天发送告警
-
HTTP状态非200告警
- name: Blackbox 监控告警
rules:
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 30m
labels:
severity: warning
annotations:
summary: telnet (instance labels.instance ) 超时1秒 description: "VALUE = value n LABELS = labels " - alert: BlackboxProbeHttpFailure expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400 for: 30m labels: severity: critical annotations: summary: HTTP 状态码 (instance labels.instance )
description: "HTTP status code is not 200-399n VALUE = value n LABELS = labels "
- alert: BlackboxSslCertificateWillExpireSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
for: 30m
labels:
severity: warning
annotations:
summary: 域名证书即将过期 (instance labels.instance ) description: "域名证书30天后过期n VALUE = value n LABELS = labels " - alert: BlackboxSslCertificateWillExpireSoon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7 for: 30m labels: severity: critical annotations: summary: 域名证书即将过期 (instance labels.instance )
description: "域名证书7天后过期n VALUE = value n LABELS = labels "
- alert: BlackboxSslCertificateExpired
expr: probe_ssl_earliest_cert_expiry - time() <= 0
for: 30m
labels:
severity: critical
annotations:
summary: 域名证书已过期 (instance labels.instance ) description: "域名证书已过期n VALUE = value n LABELS = labels " - alert: BlackboxProbeSlowHttp expr: avg_over_time(probe_http_duration_seconds[1m]) > 10 for: 30m labels: severity: warning annotations: summary: HTTP请求超时 (instance labels.instance )
description: "HTTP请求超时超过10秒n VALUE = value n LABELS = labels "
- name: Blackbox 监控告警
重启prometheus