【数据库监控系列】Prometheus+Alertmanager+Grafana容器化部署

【数据库监控系列】Prometheus+Alertmanager+Grafana容器化部署

快速安装docker环境

🐬 参考:https://blog.csdn.net/Sebastien23/article/details/137086778

被监控端部署exporter

🐬参考:https://gottdeskrieges.blog.csdn.net/article/details/136403810

配置Redis exporter

在被监控端部署exporter:

bash 复制代码
# 仅主机网络模式部署
docker run -d --name redis_exporter --restart unless-stopped --network host oliver006/redis_exporter 

检查是否采集到数据:

bash 复制代码
curl -X GET http://localhost:9121/metrics > redis.metrics
cat redis.metrics | grep redis_up

配置MySQL exporter

在数据库中创建监控用户:

sql 复制代码
CREATE USER 'exporter'@'%' IDENTIFIED BY 'Monpass_XXX' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';

在被监控端部署exporter:

bash 复制代码
# 容器化部署
mkdir -vp /opt/mysqld-exporter/

cat > /opt/mysqld-exporter/config.my-cnf << EOF
[client]
host=127.0.0.1
user=exporter
password=Monpass_XXX
EOF

cd /opt && chown -R polkitd mysqld-exporter/

docker run -d --network host --name mysqld_exporter_1 --restart unless-stopped \
-v /opt/mysqld-exporter/config.my-cnf:/etc/mysql/config.my-cnf \
prom/mysqld-exporter --config.my-cnf=/etc/mysql/config.my-cnf \
--web.listen-address=:9104 \
--no-collect.info_schema.query_response_time \
--no-collect.info_schema.innodb_cmp \
--no-collect.info_schema.innodb_cmpmem \
--collect.info_schema.processlist --collect.binlog_size

docker ps -a

检查是否采集到数据:

bash 复制代码
curl -X GET http://localhost:9104/metrics > mysqld.metrics
cat mysqld.metrics | grep mysql_up

部署Prometheus和Grafana

创建安装目录及相关文件:

bash 复制代码
mkdir -vp /opt/docker-compose/prometheus/data/
mkdir -vp /opt/docker-compose/prometheus/conf/
mkdir -vp /opt/docker-compose/prometheus/conf/rules/

#mkdir -vp /opt/docker-compose/grafana/data/
#mkdir -vp /opt/docker-compose/grafana/conf/

touch /opt/docker-compose/prometheus/conf/prometheus.yml
touch /opt/docker-compose/prometheus/conf/alertmanager.yml

#告警规则文件
touch /opt/docker-compose/prometheus/conf/rules/redis_alerts.yml
touch /opt/docker-compose/prometheus/conf/rules/mysql_alerts.yml

touch /opt/docker-compose/docker-compose.yml

cd /opt/ && chown -R polkitd docker-compose/ 

Prometheus配置文件

配置文件prometheus.yml

yml 复制代码
#vi /opt/docker-compose/prometheus/conf/prometheus.yml

global:
  scrape_interval:     15s # Default is every 1 minute.
  evaluation_interval: 15s # The default is every 1 minute.
  scrape_timeout:      10s # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['<ALERTMANAGER_IP>:9093']
      # - alertmanager:9093

# load alerting rule files
rule_files:
  - "rules/*.yml"
  - "rules/*.yaml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['<PROMETHEUS_SERVER_IP>:9090']

  - job_name: 'redis_exporter'
    static_configs:
    - targets: ['<REDIS_EXPORTER_IP>:9121']

  - job_name: 'mysql_exporter'
    scrape_interval: 8s
    static_configs:
      - targets: ['<MYSQL_EXPORTER_IP>:9104']

:根据实际情况修改尖括号中的IP地址。

Alertmanager配置文件

配置文件alertmanager.yml

yml 复制代码
#vi /opt/docker-compose/prometheus/conf/alertmanager.yml

global:
  resolve_timeout: 5m                       # 处理超时时间,默认为5min
  smtp_smarthost: 'smtp.qq.com:465'         # 邮箱smtp服务器代理
  smtp_from: '123456789@qq.com'             # 发送邮箱名称
  smtp_auth_username: '123456789@qq.com'    # 发邮件的邮箱用户名
  smtp_auth_password: 'xxxxxx'              # 邮箱密码或授权码
  smtp_require_tls: false                   # 不进行tls验证
  
# 自定义html模板,发邮件的时候用自定义的模板内容
templates:
  - 'template/*.tmpl'

# 定义路由树信息,这个路由可以接收到所有的告警
route:
  group_by: ['alertname'] # 报警分组依据
  group_wait: 10s         # 最初即第一次等待多久时间发送一组警报的通知
  group_interval: 60s     # 在发送新警报前的等待时间
  repeat_interval: 1h     # 发送重复警报的周期。对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
  receiver: 'email'       # 发送警报的接收者的名称,下面的receivers.name

# 定义警报接收者信息
receivers:
  - name: 'email'                            # 路由中对应的receiver名称
    email_configs:                           # 邮箱配置
    - to: '987654321@qq.com'                 # 接收警报的email配置
      #html: '{{ template "test.html" . }}'  # 设定邮箱的内容模板

:根据实际情况修改邮箱地址和授权码。

rule_files告警规则文件

Redis实例宕机告警规则:

yml 复制代码
#vi /opt/docker-compose/prometheus/conf/rules/redis_alerts.yml

groups:
- name: redis_alert_rules
  rules:
  - alert: redis_down     # name of an alerting rule
    # alert triggering condition defined using PromSQL expression. Here params must be metrics scraped by redis exporter.
    expr: up{job="redis_exporter"} == 0
    for: 1m
    labels:
      severity: "Critical"
    # alert info that will sent to alertmanager
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes. Current value: {{ $value }}"
      

MySQL实例宕机告警规则:

yml 复制代码
#vi /opt/docker-compose/prometheus/conf/rules/mysql_alerts.yml

groups:
- name: mysql_alert_rules
  rules:
  - alert: mysql_down
    # alert triggering condition defined using PromSQL expression. Here params must be metrics scraped by mysql exporter.
    expr: up{job="mysql_exporter"} == 0
    for: 1m
    labels:
      severity: "Critical"
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes. Current value: {{ $value }}"
      

docker-compose部署文件

编写docker compose文件,用于部署prometheus、alertmanager和grafana。

yml 复制代码
#vi /opt/docker-compose/docker-compose.yml

version: '3'

services:
  prometheus:
    image: prom/prometheus
    network_mode: host
    container_name: prometheus_1
    restart: unless-stopped
    # if you are running as root then set it to 0, else find the right id with the id -u command
    user: '0'
    #ports:
    #  - '9090:9090'
    environment:
      TZ: "Asia/Shanghai"
    #command: ["/etc/prometheus/prometheus.yml"]
    volumes:
      - ./prometheus/data/:/prometheus/
      - ./prometheus/conf/rules/:/etc/prometheus/rules/
      - ./prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml

  alertmanager:
    image: prom/alertmanager
    network_mode: host
    container_name: alertmanager_1
    restart: unless-stopped
    # if you are running as root then set it to 0, else find the right id with the id -u command
    user: '0'
    #ports:
    #  - '9093:9093'
    environment:
      TZ: "Asia/Shanghai"
    volumes:
      - ./prometheus/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
      
  grafana:
    image: grafana/grafana
    network_mode: host
    container_name: grafana_1
    restart: unless-stopped
    # if you are running as root then set it to 0, else find the right id with the id -u command
    user: '0'  
    #ports:
    #  - '3000:3000'
    environment:
      - TZ="Asia/Shanghai"
      #- GF_INSTALL_PLUGINS="grafana-simple-json-datasource"
    volumes:
      - 'grafana_storage:/var/lib/grafana'

# DECLARE DOCKER VOLUME FOR GRAFANA_STORAGE      
volumes:
  grafana_storage: {}

运行容器并检查:

bash 复制代码
# 启动容器
cd /opt/docker-compose/ && docker-compose up -d

# 停止容器
cd /opt/docker-compose/ && docker-compose stop

部署后检查:

  • 访问http://<Prometheus_SERVER_IP>:9090,检查Prometheus是否部署成功。
  • 访问http://<Alertmanager_SERVER_IP>:9093,检查AlertManager是否部署成功。
  • 访问http://<Grafana_SERVER_IP>:3000,检查Grafana是否部署成功,默认用户名和密码都是admin,登录后需要修改密码。

常见报错信息

  1. Docker-compose文件中,如果没有user: '0',可能会收到下面的报错:
bash 复制代码
caller=query_logger.go:86 level=error component=activeQueryTracker msg="Error opening query log file" 
file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied
  1. Grafana容器的环境变量赋值只能用等号,如果用冒号会收到如下报错(怀疑跟镜像或docker-compose的版本有关):
yml 复制代码
environment:
  - TZ: "Asia/Shanghai"
  - GF_INSTALL_PLUGINS: "grafana-clock-panel,grafana-simple-json-datasource"
...

# 运行容器时报错
panic: interface conversion: interface {} is map[string]interface {}, not string
  1. 如果docker-compose文件中配置了往Grafana容器中安装某些插件(例如grafana-clock-panel),可能会导致grafana容器不断重启。检查容器日志会看到如下报错:
bash 复制代码
$ docker logs -fn grafana_1
...
Error: ✗ Plugin not found (Grafana v8.3.3 linux-amd64)

配置Grafana仪表盘

添加数据源

在添加数据源中选择Prometheus,在HTTP下的URL栏中粘贴http://<Prometheus_SERVER_IP>:9090并保存。

创建仪表盘

在新建仪表盘(New dashboard )中点击右侧的导入仪表盘(Create --> Import),输入并搜索仪表盘编号,在显示的仪表盘选项(Options)中选择数据源为Prometheus,最后点击导入即可。

  • 推荐的REDIS仪表盘:763, 11835
  • 推荐的MySQL仪表盘:7362

关于自定义仪表盘,以后将补充文章专门介绍。

References

【1】https://gottdeskrieges.blog.csdn.net/article/details/113645177

【2】https://prometheus.io/docs/prometheus/latest/installation/

【3】https://blog.csdn.net/qq_36306519/article/details/128255913

【4】https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

【5】https://blog.csdn.net/weixin_45697293/article/details/119353915

【6】https://grafana.com/docs/grafana/latest/setup-grafana/installation/docker/

【7】https://grafana.com/grafana/dashboards

【8】https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/

相关推荐
PGCCC11 分钟前
【PGCCC】Postgresql 存储设计
数据库·postgresql
PcVue China2 小时前
PcVue + SQL Grid : 释放数据的无限潜力
大数据·服务器·数据库·sql·科技·安全·oracle
魔道不误砍柴功4 小时前
简单叙述 Spring Boot 启动过程
java·数据库·spring boot
锐策4 小时前
〔 MySQL 〕数据库基础
数据库·mysql
远歌已逝5 小时前
管理Oracle实例(二)
数据库·oracle
日月星宿~5 小时前
【MySQL】summary
数据库·mysql
爱吃土豆的程序员5 小时前
在oracle官网下载资源显示400 Bad Request Request Header Or Cookie Too Large 解决办法
java·数据库·oracle·cookie
睿思达DBA_WGX6 小时前
Oracle 11g rac 集群节点的修复过程
数据库·oracle
尘浮生6 小时前
Java项目实战II基于微信小程序的移动学习平台的设计与实现(开发文档+数据库+源码)
java·开发语言·数据库·spring boot·学习·微信小程序·小程序
Leo.yuan7 小时前
数据量大Excel卡顿严重?选对报表工具提高10倍效率
数据库·数据分析·数据可视化·powerbi