【数据库监控系列】Prometheus+Alertmanager+Grafana容器化部署

【数据库监控系列】Prometheus+Alertmanager+Grafana容器化部署

快速安装docker环境

🐬 参考:https://blog.csdn.net/Sebastien23/article/details/137086778

被监控端部署exporter

🐬参考:https://gottdeskrieges.blog.csdn.net/article/details/136403810

配置Redis exporter

在被监控端部署exporter:

bash 复制代码
# 仅主机网络模式部署
docker run -d --name redis_exporter --restart unless-stopped --network host oliver006/redis_exporter 

检查是否采集到数据:

bash 复制代码
curl -X GET http://localhost:9121/metrics > redis.metrics
cat redis.metrics | grep redis_up

配置MySQL exporter

在数据库中创建监控用户:

sql 复制代码
CREATE USER 'exporter'@'%' IDENTIFIED BY 'Monpass_XXX' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

在被监控端部署exporter:

bash 复制代码
# 容器化部署
mkdir -vp /opt/mysqld-exporter/

cat > /opt/mysqld-exporter/config.my-cnf << EOF
[client]
host=127.0.0.1
user=exporter
password=Monpass_XXX
EOF

cd /opt && chown -R polkitd mysqld-exporter/

docker run -d --network host --name mysqld_exporter_1 --restart unless-stopped \
-v /opt/mysqld-exporter/config.my-cnf:/etc/mysql/config.my-cnf \
prom/mysqld-exporter --config.my-cnf=/etc/mysql/config.my-cnf \
--web.listen-address=:9104 \
--no-collect.info_schema.query_response_time \
--no-collect.info_schema.innodb_cmp \
--no-collect.info_schema.innodb_cmpmem \
--collect.info_schema.processlist --collect.binlog_size

docker ps -a

检查是否采集到数据:

bash 复制代码
curl -X GET http://localhost:9104/metrics > mysqld.metrics
cat mysqld.metrics | grep mysql_up

部署Prometheus和Grafana

创建安装目录及相关文件:

bash 复制代码
mkdir -vp /opt/docker-compose/prometheus/data/
mkdir -vp /opt/docker-compose/prometheus/conf/
mkdir -vp /opt/docker-compose/prometheus/conf/rules/

#mkdir -vp /opt/docker-compose/grafana/data/
#mkdir -vp /opt/docker-compose/grafana/conf/

touch /opt/docker-compose/prometheus/conf/prometheus.yml
touch /opt/docker-compose/prometheus/conf/alertmanager.yml

#告警规则文件
touch /opt/docker-compose/prometheus/conf/rules/redis_alerts.yml
touch /opt/docker-compose/prometheus/conf/rules/mysql_alerts.yml

touch /opt/docker-compose/docker-compose.yml

cd /opt/ && chown -R polkitd docker-compose/ 

Prometheus配置文件

配置文件prometheus.yml

yml 复制代码
#vi /opt/docker-compose/prometheus/conf/prometheus.yml

global:
  scrape_interval:     15s # Default is every 1 minute.
  evaluation_interval: 15s # The default is every 1 minute.
  scrape_timeout:      10s # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['<ALERTMANAGER_IP>:9093']
      # - alertmanager:9093

# load alerting rule files
rule_files:
  - "rules/*.yml"
  - "rules/*.yaml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['<PROMETHEUS_SERVER_IP>:9090']

  - job_name: 'redis_exporter'
    static_configs:
    - targets: ['<REDIS_EXPORTER_IP>:9121']

  - job_name: 'mysql_exporter'
    scrape_interval: 8s
    static_configs:
      - targets: ['<MYSQL_EXPORTER_IP>:9104']

:根据实际情况修改尖括号中的IP地址。

Alertmanager配置文件

配置文件alertmanager.yml

yml 复制代码
#vi /opt/docker-compose/prometheus/conf/alertmanager.yml

global:
  resolve_timeout: 5m                       # 处理超时时间,默认为5min
  smtp_smarthost: 'smtp.qq.com:465'         # 邮箱smtp服务器代理
  smtp_from: '123456789@qq.com'             # 发送邮箱名称
  smtp_auth_username: '123456789@qq.com'    # 发邮件的邮箱用户名
  smtp_auth_password: 'xxxxxx'              # 邮箱密码或授权码
  smtp_require_tls: false                   # 不进行tls验证
  
# 自定义html模板,发邮件的时候用自定义的模板内容
templates:
  - 'template/*.tmpl'

# 定义路由树信息,这个路由可以接收到所有的告警
route:
  group_by: ['alertname'] # 报警分组依据
  group_wait: 10s         # 最初即第一次等待多久时间发送一组警报的通知
  group_interval: 60s     # 在发送新警报前的等待时间
  repeat_interval: 1h     # 发送重复警报的周期。对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
  receiver: 'email'       # 发送警报的接收者的名称,下面的receivers.name

# 定义警报接收者信息
receivers:
  - name: 'email'                            # 路由中对应的receiver名称
    email_configs:                           # 邮箱配置
    - to: '987654321@qq.com'                 # 接收警报的email配置
      #html: '{{ template "test.html" . }}'  # 设定邮箱的内容模板

:根据实际情况修改邮箱地址和授权码。

rule_files告警规则文件

Redis实例宕机告警规则:

yml 复制代码
#vi /opt/docker-compose/prometheus/conf/rules/redis_alerts.yml

groups:
- name: redis_alert_rules
  rules:
  - alert: redis_down     # name of an alerting rule
    # alert triggering condition defined using PromSQL expression. Here params must be metrics scraped by redis exporter.
    expr: up{job="redis_exporter"} == 0
    for: 1m
    labels:
      severity: "Critical"
    # alert info that will sent to alertmanager
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes. Current value: {{ $value }}"
      

MySQL实例宕机告警规则:

yml 复制代码
#vi /opt/docker-compose/prometheus/conf/rules/mysql_alerts.yml

groups:
- name: mysql_alert_rules
  rules:
  - alert: mysql_down
    # alert triggering condition defined using PromSQL expression. Here params must be metrics scraped by mysql exporter.
    expr: up{job="mysql_exporter"} == 0
    for: 1m
    labels:
      severity: "Critical"
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes. Current value: {{ $value }}"
      

docker-compose部署文件

编写docker compose文件,用于部署prometheus、alertmanager和grafana。

yml 复制代码
#vi /opt/docker-compose/docker-compose.yml

version: '3'

services:
  prometheus:
    image: prom/prometheus
    network_mode: host
    container_name: prometheus_1
    restart: unless-stopped
    # if you are running as root then set it to 0, else find the right id with the id -u command
    user: '0'
    #ports:
    #  - '9090:9090'
    environment:
      TZ: "Asia/Shanghai"
    #command: ["/etc/prometheus/prometheus.yml"]
    volumes:
      - ./prometheus/data/:/prometheus/
      - ./prometheus/conf/rules/:/etc/prometheus/rules/
      - ./prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml

  alertmanager:
    image: prom/alertmanager
    network_mode: host
    container_name: alertmanager_1
    restart: unless-stopped
    # if you are running as root then set it to 0, else find the right id with the id -u command
    user: '0'
    #ports:
    #  - '9093:9093'
    environment:
      TZ: "Asia/Shanghai"
    volumes:
      - ./prometheus/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
      
  grafana:
    image: grafana/grafana
    network_mode: host
    container_name: grafana_1
    restart: unless-stopped
    # if you are running as root then set it to 0, else find the right id with the id -u command
    user: '0'  
    #ports:
    #  - '3000:3000'
    environment:
      - TZ="Asia/Shanghai"
      #- GF_INSTALL_PLUGINS="grafana-simple-json-datasource"
    volumes:
      - 'grafana_storage:/var/lib/grafana'

# DECLARE DOCKER VOLUME FOR GRAFANA_STORAGE      
volumes:
  grafana_storage: {}

运行容器并检查:

bash 复制代码
# 启动容器
cd /opt/docker-compose/ && docker-compose up -d

# 停止容器
cd /opt/docker-compose/ && docker-compose stop

部署后检查:

  • 访问http://<Prometheus_SERVER_IP>:9090,检查Prometheus是否部署成功。
  • 访问http://<Alertmanager_SERVER_IP>:9093,检查AlertManager是否部署成功。
  • 访问http://<Grafana_SERVER_IP>:3000,检查Grafana是否部署成功,默认用户名和密码都是admin,登录后需要修改密码。

常见报错信息

  1. Docker-compose文件中,如果没有user: '0',可能会收到下面的报错:
bash 复制代码
caller=query_logger.go:86 level=error component=activeQueryTracker msg="Error opening query log file" 
file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied
  1. Grafana容器的环境变量赋值只能用等号,如果用冒号会收到如下报错(怀疑跟镜像或docker-compose的版本有关):
yml 复制代码
environment:
  - TZ: "Asia/Shanghai"
  - GF_INSTALL_PLUGINS: "grafana-clock-panel,grafana-simple-json-datasource"
...

# 运行容器时报错
panic: interface conversion: interface {} is map[string]interface {}, not string
  1. 如果docker-compose文件中配置了往Grafana容器中安装某些插件(例如grafana-clock-panel),可能会导致grafana容器不断重启。检查容器日志会看到如下报错:
bash 复制代码
$ docker logs -fn grafana_1
...
Error: ✗ Plugin not found (Grafana v8.3.3 linux-amd64)

配置Grafana仪表盘

添加数据源

在添加数据源中选择Prometheus,在HTTP下的URL栏中粘贴http://<Prometheus_SERVER_IP>:9090并保存。

创建仪表盘

在新建仪表盘(New dashboard )中点击右侧的导入仪表盘(Create --> Import),输入并搜索仪表盘编号,在显示的仪表盘选项(Options)中选择数据源为Prometheus,最后点击导入即可。

  • 推荐的REDIS仪表盘:763, 11835
  • 推荐的MySQL仪表盘:7362

关于自定义仪表盘,以后将补充文章专门介绍。

References

【1】https://gottdeskrieges.blog.csdn.net/article/details/113645177

【2】https://prometheus.io/docs/prometheus/latest/installation/

【3】https://blog.csdn.net/qq_36306519/article/details/128255913

【4】https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

【5】https://blog.csdn.net/weixin_45697293/article/details/119353915

【6】https://grafana.com/docs/grafana/latest/setup-grafana/installation/docker/

【7】https://grafana.com/grafana/dashboards

【8】https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/

相关推荐
野犬寒鸦7 小时前
从零起步学习JVM || 第一章:类加载器与双亲委派机制模型详解
java·jvm·数据库·后端·学习
IvorySQL8 小时前
PostgreSQL 分区表的 ALTER TABLE 语句执行机制解析
数据库·postgresql·开源
·云扬·8 小时前
MySQL 8.0 Redo Log 归档与禁用实战指南
android·数据库·mysql
IT邦德8 小时前
Oracle 26ai DataGuard 搭建(RAC到单机)
数据库·oracle
惊讶的猫9 小时前
redis分片集群
数据库·redis·缓存·分片集群·海量数据存储·高并发写
不爱缺氧i9 小时前
完全卸载MariaDB
数据库·mariadb
纤纡.9 小时前
Linux中SQL 从基础到进阶:五大分类详解与表结构操作(ALTER/DROP)全攻略
linux·数据库·sql
jiunian_cn9 小时前
【Redis】渐进式遍历
数据库·redis·缓存
橙露10 小时前
Spring Boot 核心原理:自动配置机制与自定义 Starter 开发
java·数据库·spring boot
冰暮流星10 小时前
sql语言之分组语句group by
java·数据库·sql