【数据库监控系列】Prometheus+Alertmanager+Grafana容器化部署
- 快速安装docker环境
- 被监控端部署exporter
-
- [配置Redis exporter](#配置Redis exporter)
- [配置MySQL exporter](#配置MySQL exporter)
- 部署Prometheus和Grafana
- 配置Grafana仪表盘
快速安装docker环境
🐬 参考:https://blog.csdn.net/Sebastien23/article/details/137086778
被监控端部署exporter
🐬参考:https://gottdeskrieges.blog.csdn.net/article/details/136403810
配置Redis exporter
在被监控端部署exporter:
bash
# 仅主机网络模式部署
docker run -d --name redis_exporter --restart unless-stopped --network host oliver006/redis_exporter
检查是否采集到数据:
bash
curl -X GET http://localhost:9121/metrics > redis.metrics
cat redis.metrics | grep redis_up
配置MySQL exporter
在数据库中创建监控用户:
sql
CREATE USER 'exporter'@'%' IDENTIFIED BY 'Monpass_XXX' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
在被监控端部署exporter:
bash
# 容器化部署
mkdir -vp /opt/mysqld-exporter/
cat > /opt/mysqld-exporter/config.my-cnf << EOF
[client]
host=127.0.0.1
user=exporter
password=Monpass_XXX
EOF
cd /opt && chown -R polkitd mysqld-exporter/
docker run -d --network host --name mysqld_exporter_1 --restart unless-stopped \
-v /opt/mysqld-exporter/config.my-cnf:/etc/mysql/config.my-cnf \
prom/mysqld-exporter --config.my-cnf=/etc/mysql/config.my-cnf \
--web.listen-address=:9104 \
--no-collect.info_schema.query_response_time \
--no-collect.info_schema.innodb_cmp \
--no-collect.info_schema.innodb_cmpmem \
--collect.info_schema.processlist --collect.binlog_size
docker ps -a
检查是否采集到数据:
bash
curl -X GET http://localhost:9104/metrics > mysqld.metrics
cat mysqld.metrics | grep mysql_up
部署Prometheus和Grafana
创建安装目录及相关文件:
bash
mkdir -vp /opt/docker-compose/prometheus/data/
mkdir -vp /opt/docker-compose/prometheus/conf/
mkdir -vp /opt/docker-compose/prometheus/conf/rules/
#mkdir -vp /opt/docker-compose/grafana/data/
#mkdir -vp /opt/docker-compose/grafana/conf/
touch /opt/docker-compose/prometheus/conf/prometheus.yml
touch /opt/docker-compose/prometheus/conf/alertmanager.yml
#告警规则文件
touch /opt/docker-compose/prometheus/conf/rules/redis_alerts.yml
touch /opt/docker-compose/prometheus/conf/rules/mysql_alerts.yml
touch /opt/docker-compose/docker-compose.yml
cd /opt/ && chown -R polkitd docker-compose/
Prometheus配置文件
配置文件prometheus.yml
:
yml
#vi /opt/docker-compose/prometheus/conf/prometheus.yml
global:
scrape_interval: 15s # Default is every 1 minute.
evaluation_interval: 15s # The default is every 1 minute.
scrape_timeout: 10s # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['<ALERTMANAGER_IP>:9093']
# - alertmanager:9093
# load alerting rule files
rule_files:
- "rules/*.yml"
- "rules/*.yaml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['<PROMETHEUS_SERVER_IP>:9090']
- job_name: 'redis_exporter'
static_configs:
- targets: ['<REDIS_EXPORTER_IP>:9121']
- job_name: 'mysql_exporter'
scrape_interval: 8s
static_configs:
- targets: ['<MYSQL_EXPORTER_IP>:9104']
注:根据实际情况修改尖括号中的IP地址。
Alertmanager配置文件
配置文件alertmanager.yml
:
yml
#vi /opt/docker-compose/prometheus/conf/alertmanager.yml
global:
resolve_timeout: 5m # 处理超时时间,默认为5min
smtp_smarthost: 'smtp.qq.com:465' # 邮箱smtp服务器代理
smtp_from: '123456789@qq.com' # 发送邮箱名称
smtp_auth_username: '123456789@qq.com' # 发邮件的邮箱用户名
smtp_auth_password: 'xxxxxx' # 邮箱密码或授权码
smtp_require_tls: false # 不进行tls验证
# 自定义html模板,发邮件的时候用自定义的模板内容
templates:
- 'template/*.tmpl'
# 定义路由树信息,这个路由可以接收到所有的告警
route:
group_by: ['alertname'] # 报警分组依据
group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知
group_interval: 60s # 在发送新警报前的等待时间
repeat_interval: 1h # 发送重复警报的周期。对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
receiver: 'email' # 发送警报的接收者的名称,下面的receivers.name
# 定义警报接收者信息
receivers:
- name: 'email' # 路由中对应的receiver名称
email_configs: # 邮箱配置
- to: '987654321@qq.com' # 接收警报的email配置
#html: '{{ template "test.html" . }}' # 设定邮箱的内容模板
注:根据实际情况修改邮箱地址和授权码。
rule_files告警规则文件
Redis实例宕机告警规则:
yml
#vi /opt/docker-compose/prometheus/conf/rules/redis_alerts.yml
groups:
- name: redis_alert_rules
rules:
- alert: redis_down # name of an alerting rule
# alert triggering condition defined using PromSQL expression. Here params must be metrics scraped by redis exporter.
expr: up{job="redis_exporter"} == 0
for: 1m
labels:
severity: "Critical"
# alert info that will sent to alertmanager
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes. Current value: {{ $value }}"
MySQL实例宕机告警规则:
yml
#vi /opt/docker-compose/prometheus/conf/rules/mysql_alerts.yml
groups:
- name: mysql_alert_rules
rules:
- alert: mysql_down
# alert triggering condition defined using PromSQL expression. Here params must be metrics scraped by mysql exporter.
expr: up{job="mysql_exporter"} == 0
for: 1m
labels:
severity: "Critical"
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes. Current value: {{ $value }}"
docker-compose部署文件
编写docker compose文件,用于部署prometheus、alertmanager和grafana。
yml
#vi /opt/docker-compose/docker-compose.yml
version: '3'
services:
prometheus:
image: prom/prometheus
network_mode: host
container_name: prometheus_1
restart: unless-stopped
# if you are running as root then set it to 0, else find the right id with the id -u command
user: '0'
#ports:
# - '9090:9090'
environment:
TZ: "Asia/Shanghai"
#command: ["/etc/prometheus/prometheus.yml"]
volumes:
- ./prometheus/data/:/prometheus/
- ./prometheus/conf/rules/:/etc/prometheus/rules/
- ./prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
alertmanager:
image: prom/alertmanager
network_mode: host
container_name: alertmanager_1
restart: unless-stopped
# if you are running as root then set it to 0, else find the right id with the id -u command
user: '0'
#ports:
# - '9093:9093'
environment:
TZ: "Asia/Shanghai"
volumes:
- ./prometheus/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
grafana:
image: grafana/grafana
network_mode: host
container_name: grafana_1
restart: unless-stopped
# if you are running as root then set it to 0, else find the right id with the id -u command
user: '0'
#ports:
# - '3000:3000'
environment:
- TZ="Asia/Shanghai"
#- GF_INSTALL_PLUGINS="grafana-simple-json-datasource"
volumes:
- 'grafana_storage:/var/lib/grafana'
# DECLARE DOCKER VOLUME FOR GRAFANA_STORAGE
volumes:
grafana_storage: {}
运行容器并检查:
bash
# 启动容器
cd /opt/docker-compose/ && docker-compose up -d
# 停止容器
cd /opt/docker-compose/ && docker-compose stop
部署后检查:
- 访问
http://<Prometheus_SERVER_IP>:9090
,检查Prometheus是否部署成功。 - 访问
http://<Alertmanager_SERVER_IP>:9093
,检查AlertManager是否部署成功。 - 访问
http://<Grafana_SERVER_IP>:3000
,检查Grafana是否部署成功,默认用户名和密码都是admin,登录后需要修改密码。
常见报错信息
- Docker-compose文件中,如果没有
user: '0'
,可能会收到下面的报错:
bash
caller=query_logger.go:86 level=error component=activeQueryTracker msg="Error opening query log file"
file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied
- Grafana容器的环境变量赋值只能用等号,如果用冒号会收到如下报错(怀疑跟镜像或docker-compose的版本有关):
yml
environment:
- TZ: "Asia/Shanghai"
- GF_INSTALL_PLUGINS: "grafana-clock-panel,grafana-simple-json-datasource"
...
# 运行容器时报错
panic: interface conversion: interface {} is map[string]interface {}, not string
- 如果docker-compose文件中配置了往Grafana容器中安装某些插件(例如grafana-clock-panel),可能会导致grafana容器不断重启。检查容器日志会看到如下报错:
bash
$ docker logs -fn grafana_1
...
Error: ✗ Plugin not found (Grafana v8.3.3 linux-amd64)
配置Grafana仪表盘
添加数据源
在添加数据源中选择Prometheus,在HTTP下的URL栏中粘贴http://<Prometheus_SERVER_IP>:9090
并保存。
创建仪表盘
在新建仪表盘(New dashboard )中点击右侧的导入仪表盘(Create --> Import),输入并搜索仪表盘编号,在显示的仪表盘选项(Options)中选择数据源为Prometheus,最后点击导入即可。
- 推荐的REDIS仪表盘:763, 11835
- 推荐的MySQL仪表盘:7362
关于自定义仪表盘,以后将补充文章专门介绍。
References
【1】https://gottdeskrieges.blog.csdn.net/article/details/113645177
【2】https://prometheus.io/docs/prometheus/latest/installation/
【3】https://blog.csdn.net/qq_36306519/article/details/128255913
【4】https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
【5】https://blog.csdn.net/weixin_45697293/article/details/119353915
【6】https://grafana.com/docs/grafana/latest/setup-grafana/installation/docker/
【7】https://grafana.com/grafana/dashboards
【8】https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/