Redis 哨兵高可用集群完整文档
目录
一、架构与原理
1.1 集群拓扑
┌─────────────────────────────────────────────────────────┐
│ Docker Network (redis-net) │
│ ┌──────────────┐ │
│ │ Redis Master │◄────┐ │
│ │ :6379 │ │ 复制 │
│ └──────────────┘ │ │
│ ▲ │ │
│ │ 监控 ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Sentinel 1 │ │ Redis Slave1│ │ Redis Slave2│ │
│ │ :26379 │ │ :6379 │ │ :6379 │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ │
│ │ Sentinel 2 │ │
│ │ :26380 │ │
│ └──────────────┘ │
│ ┌──────────────┐ │
│ │ Sentinel 3 │ │
│ │ :26381 │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
1.2 哨兵模式核心原理
- 监控:哨兵定期向主、从节点发送PING,检测健康状态。
- 主观下线 (SDOWN):单个哨兵在规定时间内未收到PONG回复。
- 客观下线 (ODOWN) :≥
quorum个哨兵都认为主节点SDOWN,触发故障转移。 - 自动故障转移:哨兵集群选举一个领导者,将一个从节点提升为新主,其他从节点改为复制新主,原主恢复后自动成为从节点。
- 配置提供者:客户端询问哨兵获取当前主节点地址,实现无缝切换。
1.3 关键配置参数
| 参数 | 默认值 | 说明 |
|---|---|---|
down-after-milliseconds |
30000ms | 主观下线判定时间 |
failover-timeout |
180000ms | 故障转移超时时间 |
parallel-syncs |
1 | 故障转移后同时同步的从节点数 |
quorum |
2 | 客观下线所需同意票数(本部署设为2) |
二、部署配置
2.1 环境要求
- 操作系统:CentOS 7+ / Ubuntu 18.04+
- 软件:Docker 20.10+,Docker Compose 2.0+
- 镜像:
docker.m.daocloud.io/library/redis:latest(实际版本8.6.2)
2.2 目录结构
/home/redis/redis-sentinel/
├── docker-compose.yml
├── master/
│ └── redis.conf
├── slave1/
│ └── redis.conf
├── slave2/
│ └── redis.conf
├── sentinel1/
│ └── sentinel.conf
├── sentinel2/
│ └── sentinel.conf
├── sentinel3/
│ └── sentinel.conf
└── data/
├── master/
├── slave1/
└── slave2/
2.3 配置文件
主节点 master/redis.conf
ini
bind 0.0.0.0
port 6379
appendonly yes
appendfilename "appendonly.aof"
dir /data
从节点 slave1/redis.conf 和 slave2/redis.conf
ini
bind 0.0.0.0
port 6379
appendonly yes
appendfilename "appendonly.aof"
dir /data
replicaof master 6379
哨兵配置 sentinel1/sentinel.conf (三个哨兵内容相同)
ini
port 26379
dir /tmp
sentinel resolve-hostnames yes # 关键:启用DNS解析
sentinel monitor mymaster master 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
2.4 Docker Compose 编排文件
yaml
services:
master:
image: docker.m.daocloud.io/library/redis:latest
container_name: redis-master
restart: always
ports:
- "6379:6379"
volumes:
- ./master/redis.conf:/usr/local/etc/redis/redis.conf
- ./data/master:/data
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
networks:
- redis-net
slave1:
image: docker.m.daocloud.io/library/redis:latest
container_name: redis-slave1
restart: always
ports:
- "6380:6379"
volumes:
- ./slave1/redis.conf:/usr/local/etc/redis/redis.conf
- ./data/slave1:/data
depends_on:
- master
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
networks:
- redis-net
slave2:
image: docker.m.daocloud.io/library/redis:latest
container_name: redis-slave2
restart: always
ports:
- "6381:6379"
volumes:
- ./slave2/redis.conf:/usr/local/etc/redis/redis.conf
- ./data/slave2:/data
depends_on:
- master
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
networks:
- redis-net
sentinel1:
image: docker.m.daocloud.io/library/redis:latest
container_name: redis-sentinel1
restart: always
ports:
- "26379:26379"
volumes:
- ./sentinel1/sentinel.conf:/usr/local/etc/redis/sentinel.conf
command: ["redis-sentinel", "/usr/local/etc/redis/sentinel.conf"]
depends_on:
- master
- slave1
- slave2
networks:
- redis-net
sentinel2:
image: docker.m.daocloud.io/library/redis:latest
container_name: redis-sentinel2
restart: always
ports:
- "26380:26379"
volumes:
- ./sentinel2/sentinel.conf:/usr/local/etc/redis/sentinel.conf
command: ["redis-sentinel", "/usr/local/etc/redis/sentinel.conf"]
depends_on:
- master
- slave1
- slave2
networks:
- redis-net
sentinel3:
image: docker.m.daocloud.io/library/redis:latest
container_name: redis-sentinel3
restart: always
ports:
- "26381:26379"
volumes:
- ./sentinel3/sentinel.conf:/usr/local/etc/redis/sentinel.conf
command: ["redis-sentinel", "/usr/local/etc/redis/sentinel.conf"]
depends_on:
- master
- slave1
- slave2
networks:
- redis-net
networks:
redis-net:
driver: bridge
2.5 部署步骤
bash
# 1. 创建目录
mkdir -p /home/redis/redis-sentinel/{master,slave1,slave2,sentinel{1,2,3},data/{master,slave1,slave2}}
# 2. 写入配置文件(使用上述内容)
# 3. 拉取镜像
docker pull docker.m.daocloud.io/library/redis:latest
# 4. 启动集群
cd /home/redis/redis-sentinel
docker-compose up -d
# 5. 验证状态
docker-compose ps
docker exec -it redis-master redis-cli INFO replication
docker exec -it redis-sentinel1 redis-cli -p 26379 SENTINEL master mymaster
三、运维管理
3.1 健康检查脚本
创建 /home/redis/redis-sentinel/health_check.sh:
bash
#!/bin/bash
echo "====== Redis Sentinel 健康检查 ======"
# 容器运行数
RUNNING=$(docker-compose ps --services --filter "status=running" | wc -l)
echo "运行容器数: $RUNNING / 6"
# 主从复制
MASTER_ROLE=$(docker exec redis-master redis-cli INFO replication 2>/dev/null | grep "^role:" | cut -d: -f2 | tr -d '\r')
SLAVE_CNT=$(docker exec redis-master redis-cli INFO replication 2>/dev/null | grep "^connected_slaves:" | cut -d: -f2 | tr -d '\r')
echo "主节点角色: $MASTER_ROLE, 从节点数: $SLAVE_CNT"
# 从节点同步状态
for slave in slave1 slave2; do
LINK=$(docker exec redis-$slave redis-cli INFO replication 2>/dev/null | grep "master_link_status" | cut -d: -f2 | tr -d '\r')
LAG=$(docker exec redis-$slave redis-cli INFO replication 2>/dev/null | grep "master_last_io_seconds_ago" | cut -d: -f2 | tr -d '\r')
echo "$slave: link=$LINK, lag=${LAG}s"
done
# 哨兵主节点视图
MASTER_IP=$(docker exec redis-sentinel1 redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster 2>/dev/null | head -1 | tr -d '\r')
echo "哨兵监控主节点: $MASTER_IP:6379"
# 数据一致性测试
docker exec redis-master redis-cli SET _health "ok" >/dev/null 2>&1
sleep 1
SLAVE1_VAL=$(docker exec redis-slave1 redis-cli GET _health 2>/dev/null | tr -d '\r')
if [ "$SLAVE1_VAL" = "ok" ]; then
echo "✅ 数据同步正常"
else
echo "❌ 数据同步异常"
fi
执行:chmod +x health_check.sh && ./health_check.sh
3.2 故障转移测试
bash
# 停掉主节点
docker stop redis-master
# 观察哨兵日志(另一个终端)
docker logs -f redis-sentinel1 | grep -E "sdown|odown|switch-master"
# 等待10-30秒,查看新主
docker exec -it redis-sentinel1 redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
# 恢复原主节点,它会自动成为从节点
docker start redis-master
docker exec -it redis-master redis-cli INFO replication | grep role # 应输出 slave
3.3 日常巡检项
| 项目 | 命令 | 正常标准 |
|---|---|---|
| 容器状态 | docker-compose ps |
6个容器状态均为 Up |
| 主从复制 | docker exec redis-master redis-cli INFO replication |
connected_slaves:2,从节点状态 online |
| 同步延迟 | master_last_io_seconds_ago |
< 5秒 |
| 哨兵仲裁 | SENTINEL sentinels mymaster |
显示2个其他哨兵 |
| 内存使用 | INFO memory used_memory_human |
不超过物理内存80% |
| 命中率 | INFO stats keyspace_hits/keyspace_misses |
建议 > 90% |
3.4 日志管理
bash
# 实时查看所有日志
docker-compose logs -f
# 单个容器日志
docker logs -f redis-sentinel1 --tail 100
# 筛选故障转移事件
docker logs redis-sentinel1 2>&1 | grep -E "sdown|odown|failover|switch-master"
# 配置日志轮转(/etc/docker/daemon.json)
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
3.5 备份与恢复
bash
# 手动RDB快照
docker exec -it redis-master redis-cli BGSAVE
# 备份数据目录
tar -czf backup-$(date +%Y%m%d).tar.gz /home/redis/redis-sentinel/data/
# 恢复:停止容器,覆盖data目录,重启
docker-compose down
rm -rf data/master/*
cp -r /backup/redis-data/* data/master/
docker-compose up -d
四、Prometheus监控告警
4.1 部署 Redis Exporter
在 docker-compose.yml 中添加:
yaml
redis-exporter:
image: oliver006/redis_exporter:latest
container_name: redis-exporter
restart: always
ports:
- "9121:9121"
environment:
- REDIS_ADDR=redis-master:6379
networks:
- redis-net
depends_on:
- master
启动后访问 http://宿主机IP:9121/metrics 验证。
4.2 Prometheus 配置
在 Prometheus 配置文件 prometheus.yml 中添加 job:
yaml
scrape_configs:
- job_name: 'redis-sentinel'
static_configs:
- targets: ['<宿主机IP>:9121']
metrics_path: '/metrics'
scrape_interval: 15s
4.3 关键监控指标
| 指标名 | 含义 | 告警阈值 |
|---|---|---|
redis_connected_slaves |
从节点数 | < 2 |
redis_master_last_io_seconds_ago |
同步延迟秒数 | > 10 |
redis_master_link_status |
主从连接状态(1=up) | == 0 |
redis_memory_used_bytes |
已用内存 | > 80% maxmemory |
redis_connected_clients |
客户端连接数 | 自定义 |
redis_commands_processed_total |
命令处理速率 | 突降告警 |
redis_rejected_connections_total |
拒绝连接数 | > 0 |
4.4 告警规则示例 (alert.rules.yml)
yaml
groups:
- name: redis_alerts
rules:
- alert: RedisMasterDown
expr: redis_connected_slaves == 0
for: 1m
annotations:
summary: "Redis 主节点宕机"
description: "哨兵无法检测到主节点,请检查故障转移"
- alert: RedisReplicationLag
expr: redis_master_last_io_seconds_ago > 10
for: 30s
annotations:
summary: "Redis 复制延迟过高"
description: "从节点同步延迟 {{ $value }} 秒"
- alert: RedisMemoryHigh
expr: redis_memory_used_bytes / redis_maxmemory_bytes > 0.8
for: 5m
annotations:
summary: "Redis 内存使用超过80%"
- alert: SentinelQuorumLost
expr: count(redis_sentinel_sentinels) < 2
for: 1m
annotations:
summary: "哨兵集群仲裁丢失"
4.5 Grafana 仪表板
推荐使用官方 Redis Dashboard (ID: 763 或 11835)。导入方法:
- Grafana → Create → Import
- 输入 Dashboard ID
763 - 选择 Prometheus 数据源
- 点击 Import
4.6 告警通知配置(Alertmanager)
yaml
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://your-webhook-url'
4.7 一键部署监控栈(可选)
使用 docker-compose 部署 Prometheus + Grafana:
yaml
version: '3'
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
五、附录:常用命令速查
5.1 Docker Compose 管理
bash
# 启动集群
docker-compose up -d
# 停止集群
docker-compose down
# 重启所有服务
docker-compose restart
# 重启单个服务
docker-compose restart sentinel1
# 查看日志
docker-compose logs -f [service]
# 进入容器
docker exec -it redis-master bash
5.2 Redis CLI 操作
bash
# 主节点CLI
docker exec -it redis-master redis-cli
# 从节点CLI
docker exec -it redis-slave1 redis-cli
# 哨兵CLI
docker exec -it redis-sentinel1 redis-cli -p 26379
# 常用命令
INFO replication # 复制信息
INFO memory # 内存信息
INFO stats # 统计信息
SENTINEL master mymaster # 查看主节点
SENTINEL slaves mymaster # 查看从节点
SENTINEL sentinels mymaster # 查看其他哨兵
CONFIG GET * # 查看配置
SLOWLOG GET 10 # 慢查询
CLIENT LIST # 客户端列表
MONITOR # 实时监控命令(生产慎用)
5.3 故障排查
bash
# 检查容器网络互通
docker exec -it redis-sentinel1 ping redis-master
# 查看端口监听
docker exec -it redis-master netstat -tlnp | grep 6379
# 抓取RDB文件
docker cp redis-master:/data/dump.rdb ./backup.rdb
# 重置哨兵配置(如果出现脏数据)
docker-compose down
rm -f sentinel*/sentinel.conf.bak
docker-compose up -d
5.4 性能测试
bash
# 内置压测工具
docker exec -it redis-master redis-benchmark -c 50 -n 10000 -q
# 测试特定命令
docker exec -it redis-master redis-benchmark -t set,get -n 100000
5.5 清理与重置
bash
# 完全清理集群(包括数据)
docker-compose down -v
sudo rm -rf data/*
# 只清理哨兵状态(保留数据)
docker-compose stop sentinel1 sentinel2 sentinel3
rm -f sentinel*/sentinel.conf
# 恢复原始sentinel.conf内容
docker-compose start sentinel1 sentinel2 sentinel3
文档版本
- 版本:1.0
- 最后更新:2026-04-24
- Redis版本:8.6.2
- Docker镜像:docker.m.daocloud.io/library/redis:latest