一、前置说明
- 环境现状:生产环境已部署 Prometheus(假设 Prometheus 服务运行在192.168.1.200,端口9090),Redis 集群部署在多台虚机(示例:3 主 3 从,虚机 IP 分别为192.168.1.101~106,Redis 端口6379)
- 核心思路:通过在 Redis 虚机部署redis_exporter采集数据,在现有 Prometheus 中配置采集任务与告警规则,配合promtail采集 Redis 日志,实现 4 项监控需求
- 依赖工具:redis_exporter(v1.55.0,稳定版)、promtail(v2.9.2,适配 Prometheus 日志采集)
二、步骤 1:在 Redis 虚机部署 redis_exporter(所有 Redis 节点均需执行)
2.1 下载并安装 redis_exporter
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 进入Redis虚机,创建exporter目录(统一管理) mkdir -p /opt/redis-exporter && cd /opt/redis-exporter # 2. 下载redis_exporter(Linux_x86_64架构,其他架构可替换下载链接) wget https://github.com/oliver006/redis_exporter/releases/download/v1.55.0/redis_exporter-v1.55.0.linux-amd64.tar.gz # 3. 解压压缩包 tar -zxvf redis_exporter-v1.55.0.linux-amd64.tar.gz # 4. 进入解压目录,验证是否可用(出现版本信息即成功) cd redis_exporter-v1.55.0.linux-amd64 ./redis_exporter --version |
2.2 配置 redis_exporter 为系统服务(开机自启)
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 创建系统服务文件 vim /usr/lib/systemd/system/redis-exporter.service # 2. 写入以下内容(需替换[Redis密码],无密码删除--redis.password参数) [Unit] Description=Redis Exporter After=network.target [Service] User=root Group=root # 关键:指定Redis节点地址,每个节点配置自身IP ExecStart=/opt/redis-exporter/redis_exporter-v1.55.0.linux-amd64/redis_exporter \ --redis.addr=redis://192.168.1.101:6379 \ # 替换为当前Redis虚机IP --redis.password=[Redis密码] \ # 无密码则删除此行 --collect-slowlog \ # 开启慢查询采集(必选) --collect-client-list \ # 开启连接数采集(必选) --web.listen-address=:9121 # 监听端口(默认9121,保持统一) [Install] WantedBy=multi-user.target # 3. 重载系统服务,启动并设置开机自启 systemctl daemon-reload systemctl start redis-exporter systemctl enable redis-exporter # 4. 验证服务状态(出现active(running)即成功) systemctl status redis-exporter # 5. 验证数据采集(本地访问9121/metrics,有redis_*指标即成功) curl http://192.168.1.101:9121/metrics | grep "redis_connected_clients" |
注意:所有 Redis 虚机均需重复 2.1-2.2 步骤,仅需修改ExecStart中的--redis.addr为对应虚机 IP
三、步骤 2:在现有 Prometheus 中配置 Redis 数据采集
3.1 编辑 Prometheus 配置文件(prometheus.yml)
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 进入Prometheus部署目录(假设在/opt/prometheus) cd /opt/prometheus # 2. 编辑配置文件 vim prometheus.yml # 3. 在scrape_configs节点下新增Redis采集任务(所有Redis节点需加入targets) scrape_configs: # 原有其他采集任务... # 新增Redis监控任务 - job_name: "redis-cluster-monitor" scrape_interval: 10s # 采集间隔(生产建议10s,更实时) static_configs: - targets: [ "192.168.1.101:9121", # Redis节点1的exporter地址 "192.168.1.102:9121", # Redis节点2的exporter地址 "192.168.1.103:9121", # Redis节点3的exporter地址 "192.168.1.104:9121", # Redis节点4的exporter地址 "192.168.1.105:9121", # Redis节点5的exporter地址 "192.168.1.106:9121" # Redis节点6的exporter地址 ] labels: group: "redis-cluster" # 标签:标记为Redis集群 |
3.2 验证 Prometheus 配置并重启
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 检查配置文件语法是否正确(出现Syntax OK即成功) ./promtool check config prometheus.yml # 2. 重启Prometheus服务(根据实际启动方式,此处以systemd为例) systemctl restart prometheus # 3. 验证采集状态:访问Prometheus UI(http://192.168.1.200:9090) # - 进入Status -> Targets # - 查看"redis-cluster-monitor"任务,所有target状态为UP即成功 |
四、步骤 3:4 个核心监控需求配置(告警 + 日志)
3.1 监控需求 1:Redis 集群 IP 地址更换情况
3.1.1 步骤 1:获取 Redis 集群初始 Node ID 与 IP(任意 Redis 节点执行)
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 登录Redis集群(无密码删除-a参数) redis-cli -h 192.168.1.101 -p 6379 -a [Redis密码] # 2. 执行集群节点命令,记录每个节点的node ID和IP(示例输出) cluster nodes # 示例输出(需记录:node ID -> IP) # abc123def456... 192.168.1.101:6379@16379 master - 0 1695000000000 1 connected 0-5460 # def456abc123... 192.168.1.102:6379@16379 master - 0 1695000001000 2 connected 5461-10922 |
3.1.2 步骤 2:创建 Prometheus 告警规则(redis_alert_rules.yml)
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 在Prometheus目录创建告警规则文件 vim /opt/prometheus/redis_alert_rules.yml # 2. 写入IP更换告警规则(替换为实际Node ID和IP) groups: - name: redis_cluster_ip_change rules: - alert: RedisClusterNodeIPChanged expr: | # 每个节点一条规则,用or连接 redis_cluster_members{node_id="abc123def456...",ip!="192.168.1.101"} == 1 or redis_cluster_members{node_id="def456abc123...",ip!="192.168.1.102"} == 1 or redis_cluster_members{node_id="ghi789jkl012...",ip!="192.168.1.103"} == 1 for: 1m # 持续1分钟触发(避免网络抖动误报) labels: severity: critical # 严重级别(生产建议critical,触发紧急处理) annotations: summary: "Redis集群节点IP更换告警" description: "Node ID: {{ labels.node_id }} 的IP已变更!当前IP: {{ labels.ip }}, 原IP: [对应初始IP]" runbook_url: "http://内部运维文档地址/redis-ip-change-handle" # 可选:关联运维手册 |
3.1.3 步骤 3:在 Prometheus 中引用告警规则
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 编辑prometheus.yml,添加rule_files配置 vim /opt/prometheus/prometheus.yml # 2. 在global节点后添加(若已有rule_files,追加路径即可) rule_files: - "redis_alert_rules.yml" # 新增Redis告警规则文件 # - "其他已有规则文件.yml" # 保留原有规则 # 3. 重启Prometheus生效 systemctl restart prometheus # 4. 验证规则:访问Prometheus UI -> Status -> Rules,能看到redis_cluster_ip_change规则即成功 |
3.2 监控需求 2:Redis 慢查询监控
3.2.1 步骤 1:配置 Redis 慢查询阈值(所有 Redis 节点执行)
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 登录Redis节点 redis-cli -h 192.168.1.101 -p 6379 -a [Redis密码] # 2. 临时设置慢查询阈值(10000微秒=10ms,超过此时间记录为慢查询) config set slowlog-log-slower-than 10000 # 3. 临时设置慢查询日志最大保存数(生产建议1000,避免占用过多内存) config set slowlog-max-len 1000 # 4. 永久生效(修改Redis配置文件,路径以实际为准) vim /etc/redis/redis.conf # 常见路径:/etc/redis/redis.conf 或 /usr/local/redis/redis.conf # 5. 找到并修改以下参数 slowlog-log-slower-than 10000 slowlog-max-len 1000 # 6. 重启Redis服务(根据实际启动方式,示例为systemd) systemctl restart redis # 7. 验证配置(返回10000和1000即成功) config get slowlog-log-slower-than config get slowlog-max-len |
3.2.2 步骤 2:添加慢查询告警规则(编辑 redis_alert_rules.yml)
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 编辑告警规则文件 vim /opt/prometheus/redis_alert_rules.yml # 2. 在groups下新增慢查询规则组 - name: redis_slowlog_monitor rules: # 规则1:慢查询总数超过阈值(生产建议20条,可根据业务调整) - alert: RedisSlowlogCountExceed expr: redis_slowlog_length > 20 for: 1m labels: severity: warning annotations: summary: "Redis慢查询数量超限" description: "Redis节点 {{ labels.instance }}(IP: {{ labels.ip }})慢查询总数: {{ value }} 条,超过阈值20条" suggestion: "执行命令:redis-cli -h {{ labels.ip }} -p 6379 -a [密码] slowlog get 查看具体慢查询" # 规则2:5分钟内新增慢查询(监控实时新增,及时发现异常) - alert: RedisSlowlogNewIn5m expr: increase(redis_slowlog_length[5m]) > 5 # 5分钟新增超过5条触发 for: 10s labels: severity: info annotations: summary: "Redis新增慢查询" description: "Redis节点 {{ labels.instance }} 5分钟内新增 {{ value }} 条慢查询" |
3.2.3 步骤 3:重启 Prometheus 生效
|--------------------------------------------------------------------------------------------|
| systemctl restart prometheus # 验证:在Prometheus UI -> Graph,查询redis_slowlog_length,能看到数值即成功 |
3.3 监控需求 3:Redis 连接数突增监控
3.3.1 步骤 1:配置 Redis 最大连接数(所有 Redis 节点执行)
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 登录Redis节点 redis-cli -h 192.168.1.101 -p 6379 -a [Redis密码] # 2. 临时设置最大连接数(根据服务器性能,生产建议1000-2000) config set maxclients 1500 # 3. 永久生效(修改redis.conf) vim /etc/redis/redis.conf # 4. 找到并修改(若无则新增) maxclients 1500 # 5. 重启Redis systemctl restart redis # 6. 验证配置(返回1500即成功) config get maxclients |
3.3.2 步骤 2:添加连接数告警规则(编辑 redis_alert_rules.yml)
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 编辑告警规则文件 vim /opt/prometheus/redis_alert_rules.yml # 2. 新增连接数监控规则组 - name: redis_connection_monitor rules: # 规则1:连接数突增(5分钟内增加超过100个,阈值根据业务调整) - alert: RedisConnectionSurge expr: increase(redis_connected_clients[5m]) > 100 for: 1m labels: severity: warning annotations: summary: "Redis连接数突增" description: "Redis节点 {{ labels.instance }} 5分钟内连接数增加 {{ value }} 个,当前连接数: {{ redis_connected_clients }}" # 规则2:连接数接近最大限制(超过80%,触发扩容或排查) - alert: RedisConnectionNearMax expr: redis_connected_clients / redis_config_maxclients > 0.8 for: 1m labels: severity: critical annotations: summary: "Redis连接数接近上限" description: "Redis节点 {{ labels.instance }} 当前连接数: {{ redis_connected_clients }},最大连接数: {{ redis_config_maxclients }},使用率: {{ value | humanizePercentage }}" suggestion: "1. 检查是否有异常连接;2. 考虑扩容Redis或调整maxclients" |
3.3.3 步骤 3:重启 Prometheus 生效
|--------------------------------------------------------------------------------------------------|
| systemctl restart prometheus # 验证:在Prometheus UI -> Graph,查询redis_connected_clients,能看到实时连接数即成功 |
3.4 监控需求 4:Redis 日志内容(关键错误监控)
3.4.1 步骤 1:配置 Redis 日志输出(所有 Redis 节点执行)
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 编辑Redis配置文件,确保日志输出到文件(而非仅控制台) vim /etc/redis/redis.conf # 2. 找到并修改以下参数(确保日志路径存在且有写入权限) logfile "/var/log/redis/redis-server.log" # 日志文件路径 loglevel notice # 日志级别:notice(生产推荐,包含错误、警告) # 3. 创建日志目录并授权(若路径不存在) mkdir -p /var/log/redis chown redis:redis /var/log/redis # Redis运行用户(根据实际调整) # 4. 重启Redis,生成日志文件 systemctl restart redis # 5. 验证日志:查看日志文件是否有内容 tail -f /var/log/redis/redis-server.log |
3.4.2 步骤 2:在 Redis 虚机部署 promtail(采集日志)
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 1. 进入Redis虚机,创建promtail目录 mkdir -p /opt/promtail && cd /opt/promtail # 2. 下载promtail(Linux_x86_64架构) wget https://github.com/grafana/loki/releases/download/v2.9.2/promtail-linux-amd64.zip # 3. 解压(需先安装unzip:yum install -y unzip) unzip promtail-linux-amd64.zip # 4. 创建promtail配置文件(promtail-config.yml) vim promtail-config.yml # 5. 写入配置(替换[PrometheusIP]为实际Prometheus地址) server: http_listen_port: 9080 grpc_listen_port: 0 positions: filename: /tmp/promtail-positions</doubaocanvas> |